NASA Contractor Report 4348 


TranAir: A Full-Potential, 

Solution-Adaptive, Rectangular 

Grid Code for Predicting 

Subsonic, Transonic, and 

Supersonic Flows About 

Arbitrary Configurations s s | 

in O [p> 

o c o 

^30 


Theory Document 


O 


Johnson, S. S. Samant, M. B. Bieterman 
R. G. Melvin, D. P. Young, J. E. Bussoletti, 
and C. L. Hilmes 


CONTRACT NAS2-12513 
DECEMBER 1992 


(\jas/\ 

















NASA Contractor Report 4348 


TranAir: A Full-Potential, 
Solution-Adaptive, Rectangular 
Grid Code for Predicting 
Subsonic, Transonic, and 
Supersonic Flows About 
Arbitrary Configurations 


Theory Document 


F. T. Johnson, S. S. Samant, M. B. Bieterman, 
R. G. Melvin, D. P. Young, J. E. Bussoletti, 
and C. L. Hilmes 

Boeing Military Airplane Company 
Seattle, Washington 


Prepared for 

Ames Research Center 

under Contract NAS2-12513 

f\J/\SA 

National Aeronautics and 
Space Administration 

Office of Management 

Scientific and Technical 
Information Program 

1992 




Contents 


SUMMARY i 

1 INTRODUCTION 3 

1.1 MOTIVATION 3 

1.2 REPORT ORGANIZATION 8 

2 METHOD 9 

2.1 PROBLEM DEFINITION 9 

2.1.1 Governing Equation 9 

2.1.2 Boundary Conditions 9 

2.1.3 Variational Formulation 10 

2.1.4 Regions with Differing Total Properties 12 

2.2 OUTLINE OF THE METHOD 13 

2.3 DISCRETIZATION 15 

2.3.1 Boundary Representation 15 

2.3.2 Finite Computational Domain 15 

2.3.3 Computational Grid 17 

2.3.4 Finite Element Operators 18 

2.3.5 Grid Interfaces 23 

2.3.6 Modifications to the Bateman Principle 24 

2.3.7 Dissipation 25 

2.3.8 Accuracy of Discretization 28 

2.4 SOLUTION ALGORITHM ’ ’ ’ [ 2 9 

2.4.1 Linear Solution Algorithm 29 

2.4.2 Nonlinear Solution Algorithm 32 

2.4.3 Grid Sequencing 35 

2.4.4 Solution Adaptive Grids 49 

2.5 POSTPROCESSING 5i 

2.6 SUPERSONIC FREE STREAM FORMULATION 53 

2.6.1 Far Field Treatment 53 

2.6.2 Solution of Discrete Equations 53 

2.7 PROGRAMMING CONSIDERATIONS 57 

2.7.1 Memory Management 57 

2.7.2 Input/Output 5g 

2.7.3 Vectorization Issues 58 

2.7.4 Data Structures qq 

2.7.5 Program Libraries gl 

iii 


I ( iNTINTtONiHl 


PRECEDING PAGE BLANA MOT FILMED 



3 RESULTS 63 

3.1 RESULTS FOR LINEAR FLOW 64 

3.1.1 Sphere 64 

3.1.2 ONERA M6 Wing 68 

3.1.3 F16 Fighter Aircraft 73 

3.2 RESULTS FOR NONLINEAR FLOW 75 

3.2.1 Sphere 75 

3.2.2 ONERA M6 Wing 79 

3.2.3 F16 Fighter Aircraft 85 

3.2.4 Boeing 747-200 92 

3.2.5 Axisymmetric Nacelle with Powered Plume 98 

3.2.6 Analysis of an Installed Transport with Power Effects 98 

3.3 RESULTS FOR SUPERSONIC FREE STREAM FLOW 102 

3.3.1 Cone-Sphere Configuration 102 

3.3.2 Delta Wing Configurations 103 

3.3.3 F16 Configuration 103 

3.3.4 Bow Shocks 119 

3.3.5 General Observations 122 

4 FUTURE DIRECTIONS 123 

4.1 IMPROVEMENTS TO THE METHOD 124 

4.1.1 Reliability and Efficiency Improvements 124 

4.1.2 Upwinding Improvements 125 

4.1.3 Solution Adaptive Grid Improvements 126 

4.1.4 Higher Order Elements 128 

4.2 EULER FORMULATION 129 

4.2.1 Properties of Euler Equations 129 

4.2.2 Problems with Euler Equations 133 

4.3 WAKE CAPTURING 134 

4.4 BOUNDARY LAYER 139 

4.5 DESIGN AND OPTIMIZATION 141 

5 CONCLUSIONS 143 

6 Acknowledgements 145 

A OCT-TREE DATA STRUCTURES 147 

A.l DATA STRUCTURE ORGANIZATION 147 

A. 1.1 Base Grid 147 

A. 1.2 Oct-Trees 147 

A. 1.3 Terminology 149 

A. 2 DATA STRUCTURE REPRESENTATION 149 

A. 2.1 Header 149 

A.2.2 Base Grid Descriptors 149 

A.2.3 Refinement Families 151 


IV 



A. 2. 4 Scratch Stack 152 

A. 2. 5 T-box Map 152 

A. 2. 6 Refinement Pointers 152 

A. 3 MAJOR ALGORITHMS 153 

A. 3.1 Data Structure Modification 153 

A. 3. 2 Data Structure Interrogation 154 

B OPERATOR DEFINITION 157 

B. l IMPLEMENTATION 157 

B.2 TEST CASE 168 

B. 3 PRESSURE BOUNDARY CONDITION 180 

C GMRES 185 

C. l GMRES ALGORITHM 185 

C.2 PRECONDITIONING 187 

C. 3 GMRES AND SIMILAR METHODS 188 

D POISSON SOLVER 193 

D. l SUMMARY OF THE POISSON SOLVER 193 

D.1.1 Summary of the Green’s Function Algorithm 194 

D.l. 2 Summary of the Convolution Algorithm 195 

D.2 THEORY OF GREEN’S FUNCTION ALGORITHM 197 

D.2.1 The Green’s Function Definition 197 

D.2. 2 The Asymptotic Expansion 201 

D.2. 3 The Three Plane Representation 207 

D.2. 4 The Downstream Green’s Function 209 

D. 3 THEORY OF THE CONVOLUTION ALGORITHM 211 

D.3.1 FFT Dirichlet and Neumann Poisson Solvers 211 

D.3. 2 Transform Algorithms 220 

D.3. 3 Implementation of the James Algorithm 225 

E SPARSE SOLVER 231 

E. l NESTED DISSECTION ORDERING 231 

E.2 MATRIX ASSEMBLY 233 

E.3 MATRIX DECOMPOSITION 234 

E.4 FORWARD/BACKWARD SUBSTITUTION 236 

E.5 PERFORMANCE 236 

REFERENCES 241 


v 



List of Figures 

1.1 Complete Transport Configuration 4 

1.2 Typical Fighter-Type Configuration with Store 5 

1.3 TRANAIR Geometry Scheme 7 

2.1 Overview of the Numerical Method in TRANAIR 14 

2.2 Configuration Boundary Description in Terms of Networks of Panels . 16 

2.3 Box Finite Element With Eight Corner Unknowns 19 

2.4 Placement of Unknowns. All Grid Points Have $ Unknowns 21 

2.5 Pseudo-Unknown in Two Dimensions 25 

2.6 Upwinding Stencils in Two Dimensions for Negative x Edge 27 

2.7 Reduced Set and Possible First Dissector 32 

2.8 Iterate for Newton’s Method With Residual Damping for ONERA M6 

Wing Case, Mx = 0.84, a = 3.06°, 91% Span Station 36 

2.9 Iterates for Newton’s Method With Residual and Local Mach Number 
Damping for ONERA M6 Wing Case, M 0 0 = 0.84, a = 3.06°, 91% 

Span Station 37 

2.10 Partially Converged Iterate for the Second Continuation Step Using 
Viscosity Damping for ONERA M6 Wing Case, Moo = 0.84, a = 3.06°, 

91% Span Station 38 

2.11 Convergence Histories for Newton’s Method with Various Damping 

Strategies for ONERA M6 Wing Case, Mx = 0.84, a = 3.06° 39 

2.12 Convergence Histories for Newton’s Method, Newton’s Method with 
Viscosity Damping, and Grid Sequencing for ONERA M6 Wing Case, 

Moo = 0.84, a = 3. 06° 41 

2.13 Cuts Through The Coarse and Medium Grids Generated by Grid Se- 
quencing for ONERA M6 Wing at 91% Span, M 0 0 = 0.84,0; = 3.06°. 42 

2.14 Cut Through the Fine Grid Generated by Grid Sequencing for ONERA 

M6 Wing at 91% Span, Mx. = 0.84, a = 3.06° 43 

2.15 Cuts Through the Coarse and Medium Grids Generated by Grid Se- 

quencing for ONERA M6 Wing at the Plane of Symmetry, = 
0.84,q = 3.06° 44 

2.16 Cut Through the Fine Grid Generated by Grid Sequencing for ONERA 

M6 Wing at the Plane of Symmetry, Moo = 0.84, o = 3.06° 45 

2.17 Surface Pressure for the Three Grids Generated by Grid Sequencing 

for ONERA M6 Wing, Mx = 0.84, o = 3.06° 46 


vi 



2.18 Directions for Velocity Component Differences in Error Indicators for 

Elements A-E 47 

2.19 Initial Grid and two Grids Created in an Application of the Adaptive 

Method 50 

2.20 Solutions With and Without Post Processing for a Sphere in Linear 

Flow, Mqo = 0.0, and Transonic Flow, M <*, = 0.7 52 

2.21 Solution Adaptive Grid (No. 1) for the Supersonic Cone 54 

2.22 Solution Adaptive Grid (No. 2) for the Supersonic Cone 55 

2.23 Solution Adaptive Grid (No. 3) for the Supersonic Cone 55 

2.24 Solution Adaptive Grid (No. 5) for the Supersonic Cone 56 

3.1 Paneling Used for Sphere in Linear Flow, 1600 Panels 65 

3.2 Cuts Through Four Grids for a Sphere in Linear Flow, = 0. . . . 66 

3.3 Solutions on Four Grids for a Sphere in Linear Flow, M ^ = 0 67 

3.4 Cuts Through Two Grids for the ONERA M6 Wing in Linear Flow, 

Moo = 0, a = 3.06° 69 

3.5 Waterline Cut Through ONERA M6 Coarse Grid 70 

3.6 Three Solutions for the ONERA M6 Wing in Linear Flow at 20% span, 

Moo = 0, q = 3.06° 71 

3.7 Three Solutions for the ONERA M6 Wing in Linear Flow at 60% and 

80% Span, Moo = 0, a = 3.06° 72 

3.8 F16 Aircraft Configuration 73 

3.9 Surface Pressure at Two Stations on the F16 Wing, M 0 0 = 0.6, a = 4.0°. 74 

3.10 Cut Through the Grid for a Sphere in Transonic Flow, ,V r X) = 0.7. . . 76 

3.11 Convergence Histories for Viscosity Damping Method and Grid Se- 
quencing Method for Sphere Case, M^ = 0.7 77 

3.12 Surface Mach Numbers for Sphere, Moo = 0.7 78 

3.13 Two Cuts Through Grid for ONERA M6 Wing, M^, = 0.84, a = 3.06°. 80 

3.14 Waterline Cut Through Grid for ONERA M6 Wing, Moo = 0.84, a = 

3.06° 81 

3.15 Comparison of Surface Pressure at Four Span Stations on ONERA M6 

Wing, Moo = 0.84,0; = 3.06° 82 

3.16 Grid Cuts at 70% Span for the ONERA M6 Wing, M 0 0 = .84, a = 3.06°. 83 

3.17 Grid Cuts at 0% and 44% Span for the ONERA M6 Wing, Moo = 

.84, a = 3. 06° 83 

3.18 Pressure Coefficients for ONERA M6 Wing, M^ = .84, a = 3.06°. . . 84 

3.19 Wing Pressures for the F16, M = 0.9, a = 4.0° 85 

3.20 Cuts Through the Grid for the F16 With Tanks and Missiles, M ^ = 

0.9, a = 4.0° 86 

3.21 Computed Wing Pressures for the F16 with Tanks and Missiles, Moo = 

0.9, a = 4.0° 87 

3.22 Cuts Through the Final Adaptive Grid for the F16 With Tanks and 

Missiles, M ^ = 0.9, a = 4.0° 89 


vn 



3.23 Computed Wing Pressures for Adaptive Grid Run of the F16 with 
Tanks and Missiles, Inboard Side of Strut and Crown Line, = 

0.9, a = 4.0° 90 

3.24 Cuts Through Adaptive Grid for the F16 With Tanks and Missiles near 

the Strake, M a 0 = 0.9, a = 4.0° 91 

3.25 747-200 Transport Configuration 93 

3.26 Two Cuts Through TRANAIR Grid for 747-200 Case 94 

3.27 Wing Pressures for 747-200, M , = 0.8, a = 2.7° 95 

3.28 Grid Cuts at 69% and 96% Wing Span for 747-200, M , «, = 80, a = 2.70°. 96 

3.29 Wing Pressures for 747-200, M& = .80, a = 2.70° 97 

3.30 Cut Through Grid for an Axisymmetric Powered Nacelle, Moo = 0.1. 98 

3.31 Convergence History for Grid Sequencing Method for Axisymmetric 

Powered Nacelle Case, M ^ = 0.1 99 

3.32 Static Pressure for Axisymmetric Powered Nacelle Compared to Ex- 
periment and Navier-Stokes Code Results, Moo — 0.1 100 

3.33 TRANAIR Analysis of a Transport with Wing/Body/Nacelle/Strut 

and Power 101 

3.34 Cp Distribution on Cone Sphere at Mach 1.414. TRANAIR vs SIMP 

vs EMTAC and Analytic Solution 105 

3.35 Final Computational Grid for Cone-Sphere Configuration 106 

3.36 Cp Distribution on Subsonic Leading Edge Delta Wing at Mach 1.414. 

TRANAIR vs A502 Solution 107 

3.37 Final Computational Grid for Subsonic Leading Edge Delta Wing Con- 
figuration 108 

3.38 Cp Distribution on Supersonic Leading Edge Delta Wing at Mach 

1.414. TRANAIR vs A502 Solution 109 

3.39 Final Computational Grid for Supersonic Leading Edge Delta Wing 

Configuration 110 

3.40 Cp Distribution on Upper Surface of F16, M «> = 1.414, TRANAIR vs 

SIMP Ill 

3.41 Cp Distribution on Lower Surface of F16, Moo = 1.414, TRANAIR vs 

SIMP 112 

3.42 Cp Distribution on Side View of F16, = 1.414, TRANAIR vs SIMP. 113 

3.43 Cp Distribution on Upper Surface of F16, M 0 0 = 2.0, TRANAIR vs 

SIMP 114 

3.44 Cp Distribution on Lower Surface of F16, Moo = 2.0, TRANAIR vs 

SIMP 115 

3.45 Cp Distribution on Side View of F16, Moo = 2.0, TRANAIR vs SIMP. 116 

3.46 Cp Distribution on F16 Configuration with Tip Missiles, M 0 0 = 1.2 

and a = 4°, Comparison of TRANAIR with Test Data 117 

3.47 Representative Cuts Through Computational Grid for F16 Configura- 
tion With Tip Missile 118 

3.48 Cp Distribution on Sphere-Cone Configuration, M 0 0 = 1.414 120 

3.49 Computational Grid for F16 Configuration With Tip Missiles and Wing 

Tanks, A/^ = 1.2, a = 4° 121 


viii 


4.1 W- V S = 0 (Good Upwind Discretization Scheme) Smooth Inflow 

Distribution 135 

4.2 W- V S = 0 (Good Upwind Discretization Scheme) Discontinuous 

Inflow Distribution 136 

4.3 W- V S = 0 Non-Diffusive Scheme, Every Value of Entropy Equal to 

Some Upstream Value. No Interpolation Allowed 137 

4.4 Transonic Analysis Demands Viscous Coupling 140 

A.l Grid “Legalization” Example 148 

A.2 Pseudo- refinement to Represent the Nodes 148 

A. 3 The Overview of the Oct-tree Data Structure 150 

A.4 A Refinement Family 151 

A.5 A Refinement Pointer Block 153 

A. 6 Finding Neighboring Boxes 155 

B. l Grid Box 158 

B.2 Bateman Laplace Coefficients 165 

B.3 Standard 7-Point Laplacian 166 

B.4 One Panel Wing 169 

B.5 D-region 4 171 

B.6 Operator Coefficients for V’i = <^113 173 

B.7 Operator Coefficients for <j > 60 175 

B.8 Operator Coefficients for 03 = ^ns 177 

B.9 Operator Coefficients for t/> 7 = <f>ng 178 

B.10 Wake Networks 181 

E.l Block Structure of a Sparse Matrix Ordered with Nested Dissection. . 232 

E.2 Examples of Cutting Planes and Nodes in Resulting Dissector 233 

E.3 Sorting and Merging Procedure 235 

E.4 Cost versus drop tolerance for the ONERA M6 TRANAIR solution. . 238 

E.5 SSD storage versus drop tolerance for the ONERA M6 TRANAIR 

solution 239 


IX 



List of Tables 


D.l Eigenvectors and Eigenvalues 216 

D.2 Transforms and Their Inverses 218 

D.3 Transform Operation Counts 224 


E.l Performance Characteristics for the Sparse Solver with No Drop Tol- 
erance. Ten to Twenty Nonlinear Newton Steps are Required for each 
Solution. Each Linearized Solution Requires about 10 GMRES Itera- 
tions 237 

E.2 Performance Characteristics for the Sparse Solver with Drop Tolerance. 

Each Linearized Solution Requires About 20-40 GMRES Iterations. . 237 


x 


NOMENCLATURE 


Cij 

DFT 

E(I,J,K ■) 

FFT 

T 

F(I,J,K) 

F(x) 

9 1 


9 3 

Q 

G 

H 

H(I,J,K) 

I 

J 

L 

Lq,L 

M 

N 


h 

O 

P 

9 

Q 


r T 

R 


S t (v) 


T 

T 


V 


— upwinding operator, see Eqn. (2.43) 

= discrete Fourier transform 
= line moment integral 
= fast Fourier transform 

= nonlinear partial differential operator, see Eqn. (2.1) 

= surface moment integral 

= discrete function representing the boundary value problem 
= Neumann boundary condition parameter, see Equation (2.4) 
= Dirichlet boundary condition parameter, see Eqn (2.5) 

= continuous Green’s function 
= discrete Green’s function 
= total enthalpy 
= volume moment integral 
= identity matrix 

= Bateman functional, see Eqn. (2.9) 

= discrete operator for the given boundary value problem 
= linearized version of L 
= Mach number 
= sparse solver preconditioner, or 
number of grid points 
= unit normal 

= symbol designating “order” of magnitude 
= static pressure, 

= magnitude of the velocity 
= source strength unknowns 
= radial distance from the origin 
= ratio of local total pressure to that at oo 
= ratio of local total temperature to that at oo 
= residual, or 

computational domain (box), or 
distance away from the boundary 
= blending function used in upwinding density 
= continuous far field (Prandtl-Glauert) operator 
= discrete far field operator 
= velocity 


xi 



w 

= 

mass flux vector = pV 

a 

= 

angle of attack, or 

average operator across a boundary 

X 

= 

GMRES unknowns 

6 


discrete delta function, or 
variation 

A 

— 

difference (or jump) across a boundary, or 
undivided difference 

V 

— 

gradient, or divergence operator 

e 

= 

small change 

7 

— 

adiabatic exponent 

A* 

— 

A<£ across a wake, see Eqn. (2.7), also 
switching function, see Eqn. (2.40) 

1 / 

= 

cutoff operator, or 
edge normal vector 

$ 

— 

total potential 

<t> 

— 

perturbation potential 

4> 

— 

boundary potential 

p 

= 

density, see Eqn. (2.2) 

E 

— 

boundary or discontinuity surface 
or summation sign 

n 

= 

domain of integration 

1? 

= 

integral over volume or surface region V 

dR 

= 

boundary of the domain R 

Subscripts 




onesided derivative 

c 

= 

cut off value 

ps 

= 

pseudo unknown 

X 

= 

derivative in the coordinate direction z 

y 

— 

derivative in the coordinate direction y 

z 

= 

derivative in the coordinate direction z 

oo 

= 

at infinite distance from the configuration 


Xll 


SUMMARY 


A new computer program, called TRAN AIR, for analyzing complex configurations 
in transonic flow (with subsonic or supersonic freestream) has been developed. This 
program provides accurate and efficient simulations of nonlinear aerodynamic flows 
about aritrary geometries with the ease and flexibility of a typical panel method 
program. 

The numerical method implemented in TRANAIR is described in this report. 
The method solves the full potential equation subject to a set of general boundary 
conditions and can handle regions with differing total pressure and temperature. The 
boundary value problem is discretized using the finite element method on a locally 
refined rectangular grid. The grid is automatically constructed by the code and 
is superimposed on the boundary described by networks of panels; thus no surface 
fitted grid generation is required. The nonlinear discrete system arising from the finite 
element method is solved using a preconditioned Krylov subspace method embedded 
in an inexact Newton method. The solution is obtained on a sequence of successively 
refined grids which are either constructed adaptively based on estimated solution 
errors or are predetermined based on user inputs. Many results obtained by using 
TRANAIR to analyze aerodynamic configurations are presented. 
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Chapter 1 

INTRODUCTION 


1.1 MOTIVATION 

The role of computational modeling in engineering design has been well recognized 
for many years. Engineering problems are routinely solved through numerical simu- 
lations. In the aerospace industry, for example, aerodynamic flow about aircraft is 
often simulated using computational tools. Computational Fluid Dynamics (CFD) 
is rapidly becoming an equal partner with the wind tunnel and flight testing in the 
design of aerodynamic shapes [1, 2, 3]. 

Many engineering designs are geometrically complex. In aerodynamics, problems 
such as the analyses of close-coupled nacelles and high lift systems on a typical trans- 
port aircraft configuration (see Figure 1.1) can involve extremely complicated geome- 
tries and highly nonlinear flows containing shock waves and convected wakes. The 
geometry and flow become even more complex for a fighter type aircraft (see Fig- 
ure 1.2). There is a lack of tools that can routinely handle such complex geometries 
and treat the appropriate physical phenomena. 

Such tools should be reliable, accurate, flexible and efficient. Among the currently 
available computational tools, panel methods [4]-[14] have long been able to han- 
dle complex configurations and boundary conditions in a reliable manner, but they 
are limited to linear flow models. Aerodynamicists who use panel methods take for 
granted the ability to add, move, or delete components at will, readily select and 
change boundary condition types, and obtain accurate solutions at reasonable cost. 
Multiply connected regions (flap gaps, nacelle interiors), varying length scales, and 
flow features such as oblique shocks (in supersonic freestream flow), present few prob- 
lems to a good panel method. Consequently, there are many instances where designers 
have compromised the physics of the problem and used panel methods in order to 
try to understand the aerodynamic effects of complex geometry [15], [16]. However, 
there are other instances involving transonic flows with normal shocks where such 
compromises are not possible. 

The state of the art in calculating transonic flows has progressed significantly 
since the initial breakthrough by Murman and Cole [17]. Much success has been 
achieved in solving various forms of the potential equation, the Euler equations, and 
even the Navier-Stokes equations for special configurations [18]-[35]. Nevertheless, 





3 


PRECEDING PAGE BLANK NOT FILMED 




Figure 1.1: Complete Transport Configuration. 
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Figure 1.2: Typical Fighter-Type Configuration with Store. 
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routine analysis of complex configurations using more realistic physics has remained 
a somewhat distant goal. There are several reasons for this. 

First and foremost, most current methods use surface fitted grids. Generation of 
such grids for complex, multiply connected domains is an extremely difficult task. 
Although significant progress has been made in grid generation techniques over the 
last five years, timely treatment of configurations similar to those shown in Figures 1.1 
and 1.2 still remain beyond the capabilities of these methods. 

Second, current transonic algorithms place severe demands on grids. Few algo- 
rithms can adequately handle the anomalies which would result from the application 
of present grid generation techniques to complex configurations, e.g., non-analytic 
grids, collapsed edges, fictitious corners, oblique and/or high aspect ratio cells, etc. 

Third, many transonic methods are limited to normal flow boundary conditions. 
Panel methods routinely allow design boundary conditions, porous wall boundary 
conditions, surface jump conditions, etc. These types of boundary conditions are 
difficult to implement in the field grid methods. 

Fourth, computer run costs can be very high. In the course of airframe design, 
engineers make hundreds of runs varying angle of attack, Mach number, flap set- 
tings, inlet mass ratios, nacelle placement, etc. Thus it is imperative that individual 
runs be fairly economical. This is clearly a difficult goal to achieve for large config- 
urations, especially when the grid must be fine enough for the reliable prediction of 
drag increments. For many methods use of additional grid points causes substantial 
degradation in convergence with corresponding increase in cost. 

In this report an approach developed to overcome these problems is described. This 
approach has been implemented in a computer program called TRANAIR[36]-[51]. 
It has all the modeling generality and flexibility offered by the PAN AIR technology 
panel code [10] while solving the nonlinear full potential equation. 

The most important feature of this approach is the use of rectangular grids com- 
bined with an independently-described configuration definition. The grid is superim- 
posed over the configuration surface. The configuration surface is defined in terms of 
panels, (see Figure 1.3). This makes the program very easy to use since no surface fit- 
ted grids are required. Clearly a rectangular grid can always be superimposed on the 
configuration regardless of surface topology. But, there are several issues associated 
with rectangular grids that must be addressed. 

First, in order to accurately capture small scale effects, even for relatively simple 
geometries, local grid refinement is essential. (It should be noted that local grid 
refinement is also necessary for other approaches that use surface fitted grids for 
complex configurations, since bunching grid lines is only feasible when there are just 
a few regions requiring dense grids.) The grids used in TRANAIR are therefore locally 
refined. 

Second, rectangular grids cannot take advantage of a directional difference in 
length scales in a straightforward manner since high aspect ratio cells skew to the 
coordinate axes cannot be created by varying grid density. However, this leads to 
only modest increases in the number of grid points for inviscid solutions if the locally 
refined grid is located judiciously. 

Third, combining a general boundary configuration with a rectangular grid pro- 
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Figure 1.3: TRANAIR Geometry Scheme. 


duces irregularly shaped regions near the boundary. The problem of defining accurate 
discrete equations (operators) at node points adjacent to these regions can be solved 
by using a finite element method. In TRANAIR the finite element method is imple- 
mented using the Bateman variational principle [52]. A generalization of the Bateman 
principle allows easy implementation of nonstandard boundary conditions. Away from 
the boundary, application of the finite element method is straightforward except :or 
the treatment of infinite domains. The condition at infinity is satisfied using con- 
cepts from integral equation methods whereby potential unknowns are transformed 
to source unknowns. The inverse transformation is easily accomplished through the 
use of fast Fourier transforms (FFT’s) on rectangular grids. 

The finite set of equations thus generated is solved iteratively by an inexact New- 
ton method. At each step of the Newton method the Generalized Minimal RESidual 
algorithm (GMRES) [53], [54] is used as a driver to solve the linearized problem. GM- 
RES is preconditioned by a Poisson solver and a direct sparse solver. Fast and reliable 
convergence is achieved by combining these preconditioners in a unique manner. The 
robustness of the method is achieved through the use of a sequence of refined grids 
which are either constructed adaptively based on estimated solution errors or are 
predetermined based on user inputs. 
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1.2 REPORT ORGANIZATION 

This is the Theory Document for the TRANAIR program. In this document the 
theoretical aspects of TRANAIR are described. A fairly comprehensive description of 
the method is provided in Chapter 2. Results obtained using the TRANAIR code are 
described in Chapter 3. In Chapter 4 ideas on future directions are discussed. Many 
topics which require more detail are discussed in the Appendices. The appendices are 
oct-tree data structures (Appendix A), implementation of the Bateman variational 
principle (Appendix B), GMRES algorithm (Appendix C), the exterior Poisson solver 
(Appendix D), and the sparse solver preconditioner (Appendix E). 

Information on how to use TRANAIR is provided in the User’s Manual [55]. Specif- 
ically, the preparation of input required by the program and the scripts (job control 
cards) required to run the code on the Cray Y-MP at NASA Ames Research Center 
(UNICOS) and the Cray X-MP at Boeing (COS) are described. 
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Chapter 2 

METHOD 


In this chapter a comprehensive description of the numerical method used in TRAN AIR 
is provided. This numerical method combines diverse component algorithms in an 
effective manner. 

There are eight sections in this chapter. In Section 2.1, the boundary value problem 
to be solved is presented. In Section 2.2, an outline of the method is provided. In 
Sections 2.3 and 2.4 the discretization and the solution techniques respectively, are 
described. In Section 2.5 a postprocessing technique to obtain smooth aerodynamic 
quantities is presented. In Section 2.6, the extension of the method to problems 
in supersonic free stream flow is described. In Section 2.7, certain programming 
considerations are discussed. 


2.1 PROBLEM DEFINITION 


2.1.1 Governing Equation 

The full potential equation of aerodynamics is 


JF($) = V • pV$ = 0 


( 2 . 1 ) 


where $ is the total velocity potential to be determined, and the density is given by 


P = Poo 


i + V M “ (i - £> 

L Voo , 


A- 


( 2 . 2 ) 


Here, q = |V$| is the local speed, is the uniform onset flow, = | | is the 
free stream speed, p^ is the free stream density, is the free stream Mach number, 
and 7 is the ratio of specific heats. Equation (2.1) describes the conservation of mass 
in inviscid irrotational compressible flow. 


2.1.2 Boundary Conditions 

Boundary conditions on the configuration surface are required to define a well posed 
problem. The far field condition is 
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</>=0(i) (2.3) 

as x — ► — oo, i.e., uj>stream of the object. The perturbation potential is given by 
<t> = $ — 3*00 where = V^. 

A wide variety of boundary conditions may be specified on the aircraft configura- 
tion. Normal mass flux may be specified via 


d*_ 

dn 


9 1 


(2.4) 


where n represents the direction normal to the surface. On an impermeable surface 
gi vanishes, whereas, on surfaces such the engine inlets nonzero values for g x can be 
specified. 

On engine exhaust surfaces, it is possible to impose the Dirichlet condition 


* = 93 (2-5) 

where tangential flow can be prohibited by specifying g 3 to be constant. 

Wakes must extend downstream from lifting bodies. These surfaces allow nonzero 
circulation in potential flow and can be thought of as thin sheets of concentrated 
vorticity [5]. The boundary conditions on a wake are 


h • A (pV$) = 0 


(2.6) 


and 


A p = 0 


(2.7) 


where 


P=Po 


1 + 


- 1 


l£r 


Ml 


& ~ P 


( 2 . 8 ) 


n is the unit normal vector, and A represents the jump across the wake surface. Equa- 
tion (2.6). is an expression of conservation of mass across the wake. Equation (2.7) 
is required for conservation of normal momentum. Equation (2.7) is often linearized 
about the free stream pressure p = p <*> assuming small perturbation velocity V<^>. This 
leads to the Dirichlet condition that A3> is constant along the wake in the direction 
of Vx,. The circulation p at the trailing edge is determined by imposing a Kutta 
condition there (see Appendix B.3). 


2.1.3 Variational Formulation 

The full potential equation may also be derived from the Bateman variational prin- 
ciple [52], namely, that the integral of pressure over the flow field 
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J = J Q PdVl (2.9) 

is stationary. This principle can be used to derive finite element formulas for the full 
potential equation, (see Section 2.3 below). Taking a variation of J in Eqn. (2.9) and 
using 



1^ 

1 

III 

1 

II 

^ 1 CO 

(2.10) 

it can shown that 

SJ = f -w -6 v dn 
J 0 

(2.11) 

Integrating by parts, 



6J= i 

( V W 6<t> dn - 1 h- W <5$ dE 

(2.12) 


0 Jz 


where E is the boundary of the domain or surface of discontinuity with unit normal 
n. (The second integral on the right applies to both sides of E in the case of a surface 
of discontinuity.) If J is stationary with respect to arbitrary variations in the first 
integral on the right of Eqn. (2.12) yields the mass conservation equation 

V • IT= 0 (2.13) 

This equation is identical to Eqn. (2.1). The second integral on the right yields 
conservation laws for surfaces of discontinuity. For a shock surface across which mass 
is continuous it follows that 


A(n- IV) = 0 (2.14) 

For a slip surface across which $ may be discontinuous, h- W must vanish on both 
sides. The discontinuity in $ is determined by Eqn. (2.7). 

The natural boundary condition for Eqn. (2.9) may be deduced from Eqn. (2.12), i.e., 

h- W= 0 (2.15) 

A generalization of the Bateman variational principle which incorporates the bound- 
ary conditions described above is that the functional 

J = j a p dV + j sa 9^dS (2.16) 

- L ai ^ A * - ^ ds 
+ /«/£<*-*** 
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is stationary. Here, gi is the specified mass flux on dQi, is the jump in $ across 
the wake surface d0 2 , a denotes the average of the upper surface and lower surface 
values, and g 3 is the specified potential on dQ 3 . The unknown p represents the jump 
in $ on dCli and is determined by Eqn. (2.7). 


2.1.4 Regions with Differing Total Properties 

A minor modification of the above formulation allows the simulation of flows involving 
regions of differing total temperature and total pressure. The flow in each separate 
region is still potential as long as total temperature and pressure are constant in the 
region, but pressure and density must be redefined in the following way: 


P = 
P = 


Poo 


i + 2 — !-a£(i 




<llo r T 






(2.17) 

(2.18) 


Here, r p is the ratio of the total pressure in the region to the free stream total pressure 
and rj is the ratio of total temperature in the region to free stream total temperature. 
The regions are assumed to be separated by fixed wake surfaces on which two jump 
boundary conditions are applied. The first is the standard static pressure continuity 
condition Eqn. (2.7). If the total pressure and/or temperature differences across the 
wake are large, the pressure formula, Eqn. (2.7) cannot be linearized, i.e., p can not be 
assumed to be constant in the downstream direction. The second condition is similar 
to Eqn. (2.6) but requires a modification to make the answer less sensitive to wake 
position when total pressure and temperature differences are large. Equation (2.6) is 
replaced by 


n • AW’ = 0 


(2.19) 


where 


W’ = 

pogo 


( 2 . 20 ) 


Here, qo is the speed which makes p = Poo in the given region and po is the density 
at this speed. Equation (2.6) becomes a natural jump boundary condition for = 
qoo <Sf/q 0 if the Bateman principle is modified so that 


where 


J = j^p’dv 

(2.21) 

* PooQoo 

P = P „2 - 
P o9o 

(2.22) 


Using this feature, exhaust from engines can be modeled as long as the exhaust can be 
divided into a finite number of regions each of which has a constant entropy. Section 
3.2.5 gives an example of this type of modeling. 
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2.2 OUTLINE OF THE METHOD 

An overview of the numerical method in TRANAIR is shown in Figure 2.1. 

TRANAIR uses a locally refined rectangular grid. This grid is generated (Sec- 
tion 2.3.3) in a computational domain which extends only as far as is required to 
enclose all configuration surfaces and any nonlinear flow (Section 2.3.2). First a uni- 
form global grid is constructed over the rectangular computational region. Grid cells 
in the global grid are refined by subdividing a given cell into eight similar cells. The 
decision to refine or not is controlled by two criteria. In the first specified minimum 
and maximum cell sizes are used. In the second the density of paneling used to define 
the surface geometry is used. The size of the local grid box is forced to lie within 
these two limits. Denser grid is automatically generated where the panels are smaller. 
Both these forms of control can be exercised over a global region or over certain spe- 
cially prescribed hexahedral regions. Thus, it is possible to specify arbitrary local 
hierarchical refinement. 

The solution process is carried out over a sequence of grids. This sequence of grids 
is either predetermined by successively derefining the above constructed grid (in the 
grid sequencing option ); or is constructed adaptively (in the solution adaptive grid 
option) where the solution is started on the coarsest grid in the above sequence and 
the subsequent grids are constructed based on estimates of solution error. 

The process of obtaining a solution on each grid is essentially the same in either 
option and involves two steps. The first involves discretization of the continuous 
problem and the other involves the solution of the discrete equations. 

In the discretization step, the field unknowns $ are defined at the eight corners 
of each grid cell and the potential in the cell is defined via trilinear basis functions 
(Section 2.3.4). The unknown parameters are supplemented by boundary unknowns 
ij) which are values of potential extrapolated across a boundary, and wake unknowns 
p which are introduced to satisfy the normal momentum jump condition Eqn. (2.7). 

The Bateman variational principle is used to accurately discretize the continuous 
problem (Section 2.3.4). In particular, this discretization is flux conservative. Special 
discrete operators are generated for the unknowns near the boundary. 

The nonlinear discrete equations are solved iteratively using a Newton method 
(Section 2.4.2). An initial guess required to start the Newton method is obtained 
by interpolating the solution on a coarse grid to the next finer one; except on the 
coarsest grid, where the perturbation potential <f> is taken to be exactly zero. 

For complex configurations involving correspondingly complex physical phenom- 
ena, it is advantageous to use more than one technique to solve the algebraic equa- 
tions. Each technique by itself might reduce errors more rapidly in some subsets 
of physical and frequency space than in others. Hence it is desirable to treat these 
techniques as “preconditioners” for an overall convergence stabilization and acceler- 
ation scheme such as GMRES (Section 2.4.1). The operators from the finite element 
method are used to compute residuals and the Jacobian matrix which is inverted 
using a sparse matrix solver and used as one of the preconditioners (see Appendix E). 
The iteration is stopped after the converged solution is obtained on the final grid in 
the sequence. 
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1. Generate finest grid based on geometry and user specified 
controls 

2. Extract a sequence of grids from the finest grid by repeatedly 
coarsening all parts of the grid by one refinement level 

3. For each grid (starting with the coarsest) solve the problem 

3.1 Generate Boundary Operators 

3.2 Generate Green’s function for the current global 
grid 

3.3 Interpolate initial solution for the current grid 

3.4 Solve the discrete equations using Newton’s 
method 

3.4.1 Generate and decompose the Jaco- 
bian if needed 

3.4.2 Solve the linearized problem via GM- 
RES 

Compute residuals 

Combine sparse solver and Poisson 
solver preconditioners 

3.4.3 Compute nonlinear update 

3.4.3 Return to 3.4.1 if Newton’s Method 
did not converge 

3.5 If solution adaptive gridding used 

3.5.1 Compute local error estimates 

3.5.2 Compute new grid 

3.6 If not on final grid then return to 3.1 

4. Extract aerodynamic output 

Figure 2.1: Overview of the Numerical Method in TRANAIR. 
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At the conclusion of the solution process, the potential at each grid point is avail- 
able. From the potential, a wide variety of information concerning the flow about the 
configuration may be obtained. Velocities in the field about the configuration may 
be computed, from which streamlines, Mach contours, density distributions, pres- 
sure coefficients, force and moment coefficients about the configuration, etc., can be 
computed. 


2.3 DISCRETIZATION 

In this section, the discretization process is described. First, the representation of the 
surface boundary is described. Next, the restriction of computations to a finite region 
is justified. Then, the computational grid and its generation are described. This is 
followed by a description of the finite element operators. Grid interfaces between 
different levels of refinement, modifications of the Bateman principle, and artificial 
dissipation are described at the end. 

2.3.1 Boundary Representation 

In TRAN AIR, the boundary is described independently of the volume discretization. 
The geometry of the configuration is represented by a set of networks each consisting 
of a rectangular array of corner points which form arbitrarily shaped quadrilaterals 
called panels (see Figures 2.2). The panels serve the purpose of limiting the region of 
integration for the Bateman variational functional. No fundamental unknowns (such 
as the doublets or sources in linear panel methods) are associated with the panels 
except for wakes, where a discrete set of doublet unknowns, p, are defined at various 
corner points of the wakes. 

2.3.2 Finite Computational Domain 

The computations are restricted to a finite subset of the infinite space. The restriction 
to a finite computational domain can be justified in the following way. 

Suppose that the partial differential operator T is equal to a constant coefficient 
differential operator T everywhere outside a finite rectangular region. Let Q be a 
Green’s function for T such that T(Q * Q) = Q for all Q (where Q are called sources) 
and $ = Q * Q + $oo satisfies the far field condition. Then the original differential 
equation T$ = 0 is equivalent to 

Q + {J r -T){Q*Q + $ oo) = 0. (2.23) 

Outside the finite rectangular region, T = T so that Q = 0. Thus, the unknowns Q 
are confined to a bounded region. 

For full potential flow, the far field operator is the Prandtl-Glauert operator 

T<t> = (1 - A/£,)$ XI + (2.24) 

Equation (2.24) is a linearization of the full potential Eqn. (2.1) about VJ*,. 
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Figure 2.2: Configuration Boundary Description in Terms of Networks of Panels 
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For most problems Q approaches zero (away from the boundary) much more 
rapidly than $ approaches $ 00 . This enables the termination of the computational 
domain a short distance away from the boundary or regions of nonlinear flow. Wakes, 
which extend to infinity, are exceptions which produce sources and sinks that extend 
to infinity downstream of the configuration. By assuming that such sources and sinks 
are constant in the downstream direction, their influence can be computed by using 
a downstream Green’s function (see Appendix D). 

If the continuous operators are replaced by discrete ones, the same argument holds. 
The requirement on T, the discrete version of T, is that a discrete Green’s function 
G is available that satisfies an appropriate discrete far field condition and T(G * 
Q) = Q for all Q. In practice, this means that T is a constant coefficient elliptic 
operator discretized on a uniform Cartesian grid [36], [39], [89], [90], [94], [96]. Thus, the 
computational domain need only cover the region where nonlinear flow occurs and 
where the discrete version L of the operator T is not approximated well by a discrete 
far field operator T. The discrete operator T used in the method is the standard 
seven point finite difference operator. 

2.3.3 Computational Grid 

The volume grid in the finite computational domain is generated automatically by the 
code. TRANAIR can operate in one of two modes, either grid sequencing (discussed 
in Section 2.4.3) or solution adaptive gridding (described in Section 2.4.4). In either 
case it is necessary to specify a starting grid. The process for constructing the grid 
is described below. 

First, the finite computational region is chosen to be rectangular and is divided 
into a coarse uniform rectangular grid, called the global grid, which is independent 
of the boundary surfaces. To facilitate matching between the nonlinear flow inside 
and the linear flow outside, the global grid includes one plane of unrefined global grid 
boxes on each face of the computational domain where L = T. These boxes remain 
unrefined and no boundary surface is allowed to cut these boxes (except boxes on the 
downstream face of the computational domain which can be cut by wakes). This is 
required because the source Q is assumed to be zero on the boundary points of the 
global grid. 

The global grid is locally refined in a hierarchical manner, i.e., any grid box can 
be refined into eight geometrically similar boxes of equal volume. This process is 
repeated to give a grid with any desired local resolution and is controlled by two 
criteria. 

The first criterion for local refinement is based on the length scale of the surface 
panels used to describe the boundary. Every box element that is sufficiently close 
to a panel is refined if a weighted length scale (the panel diameter multiplied by a 
panel tolerance factor which is provided as input) associated with the panel is smaller 
than the length scale associated with that box element. A box is deemed to be in the 
neighborhood of a panel if a scaled version of it generated by expanding it around 
its center intersects that panel. The expansion factor by which the boxes are scaled 
can also be specified. This factor is useful if it is desired to extend the effect of the 
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presence of the boundary on the grid refinement further out into the volume grid. 
The panel based criterion is effective in providing local refinement near the boundary 
surface. 

The second criterion is the requirement that the local box size is restricted to be in a 
specified range (dx min ,dx mor ). Boxes are refined recursively until their size is smaller 
than dx max . Further refinement depends on the panel based criterion. Refinement 
based on comparison with dx max is useful in problems where high gradients, such as 
shock waves, exist in regions away from the boundary surface. 

Both criteria for refinement can be invoked over either the entire computation 
box or over certain special regions of interest or disinterest where different desired 
ranges of box sizes and panel tolerance factors are defined. The special regions are 
hexahedral and provide ample flexibility in generating desired local refinement. 

Once all specified refinements are done, the grid is legalized so that two boxes 
abutting on a face or an edge differ by at most one refinement level. This ensures 
sparse stencils for the finite element operators and simplifies certain data structures 
[56], but allows sufficiently rapid changes in grid level. 

Locally refined box elements formed by the process described above are usually 
an unstructured collection of box elements. To completely describe these elements 
an oct-tree data structure is used. The tree is formed as boxes are created through 
refinement. A box and node numbering system is developed from the tree and adja- 
cency and other information is extracted. (See [41], [57] and Appendix A for more 
details.) 

A grid generated in this manner may be considered to be a prescribed grid. This 
grid is sequentially derefined to generate a set of coarse to fine grids. If no solution 
adaptation is used then the iterative solution is obtained on this sequence of grids 
by starting with the coarsest and moving through the finer grids. If the adaptive 
method is used then the coarsest grid is used to start the solution and all subsequent 
grids are determined according to estimated local solution errors. In either case the 
initial guess for the starting grid is zero and for each subsequent grid is obtained by 
interpolating the solution from the previous grid. 

2.3.4 Finite Element Operators 

The discrete operators on every grid in the sequence are constructed using a finite 
element method. Implementation of the finite element method for rectangular boxes 
away from the boundary and irregular boxes near the boundary is described below. 

Element Trial Functions 

Every rectangular element is geometrically identical except for a scale factor. The 
standard trilinear element trial function, parameterized by eight corner unknowns is 
used (see Figure 2.3). The trial functions used for elements near the boundary are 
also represented in a similar manner (see below). In order to generate as compact 
a stencil as possible (e.g., standard seven point operator for Poisson’s equation on a 
uniform grid) certain lumping terms are added (see Appendix B). 
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Figure 2.3: Box Finite Element With Eight Corner Unknowns. 

Implementation of the Variational Principle 

Element stiffness matrices are generated by taking variations of the functional J with 
respect to each degree of freedom. If only natural boundary conditions are present, 
(reiterating Eqn. (2.11) and noting that IV = pV and 8V = V5<^) variation of J are 
given by 


8J = - V8$dV (2.25) 

= - Y, / PV® • V£$dV 

~ -Tpi f V$ • V6$dV 

i Jq > 

Here, />, is the value of p at the centroid of the elemental region 0,. The last step 
in equation (2.25) is equivalent to replacing p by a piecewise constant function. This 
approximation of the operator coefficients maintains second order accuracy for the 
potential in the L 2 norm and first order accuracy in the energy norm [58]. 

Near Field and Far Field Boxes 

Near field boxes are boxes not cut by any boundary surface, but where L ^ T. 
Equation (2.25) defines the element stiffness matrices by considering variations of J 
with respect to each of the eight corner unknowns of the element. Thus, every near 
field box has the same element stiffness matrix up to a constant factor that depends 
only on the refinement level of the element and /),*. This results in large savings in 
storage. In addition, p is a nonlinear function of the velocity and is evaluated at the 
centroid of each element during every iteration. Thus, discrete formulas for velocity 
at the centroids in terms of the unknowns at the corners of the element are needed. 
Since all near field box elements are similar, only one velocity formula needs to be 
stored, resulting in additional large savings in storage. 

The far field boxes lie on faces of the computational domain where L = T. These 
boxes are geometrically identical. Also the density is constant in these boxes since 
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the linear flow properties are matched. Hence the operators for all such boxes are 
identical. 

Boundary Boxes and D Regions 

Boundary boxes are those cut by a boundary surface. A connected subset of a bound- 
ary box is referred to as a D-region , (see Figure 2.4). Each D-region is bounded by 
a subset of the boundary surface as well as possibly subsets of the box faces. No D- 
regions are defined in the interior of the configurations since the flow there is defined 
to satisfy <i> = 0. Hence, a box cut by a boundary surface with flow on one side and 
no flow (referred to as stagnation) on the other would have only one D-region, for 
example region D 3 in Figure 2.4. It is possible to have more than one D-region in a 
boundary box. This is the case in Figure 2.4 for D-regions Di and D 2 where a wake 
divides an element. 

Since boundary conditions on a surface can induce discontinuities in $ or V$, a 
separate element trial function is needed for each D-region. The element trial function 
for each such D-region is parameterized by unique unknowns at the corners of the 
grid box. Corner unknowns on the other side of a boundary surface from their D- 
region can be viewed as extrapolated values and are denoted by 'P. In Fig. 2.4 the 

unknowns correspond to the element trial function in D 2 and the unknowns 
correspond to the element trial function in D\. There is a one-to-one correspondence 
between element trial functions in the box and D-regions. 

Stiffness Matrices for D-regions 

Each D-region also has a distinct element stiffness matrix that must be stored. The 
coefficient of the differential operator, p, is evaluated at the centroid of the D-region 
and hence distinct velocity operators are also required for each D-region. However, 
these boundary elements represent typically only 10 to 20% of the elements needed 
to give an accurate solution of the boundary value problem. 

The element stiffness matrices D-regions are derived from an expanded version of 
Eqn. (2.25) including appropriate surface integral terms. The domain of integration 
for the volume integral is the relevant D-region. The domain for the surface integrals 
is the intersection of the boundary with the boundary of that D-region. Since the 
integrand is a product of polynomials, volume moments must be computed over the 
D-region. 

Consider the volume moment 

H(I, J, K) = f x I ~ x y J ~ x z K ~ x dV. (2.26) 

Since the boundary is parameterized by piecewise flat panels, this moment can be 
computed exactly via the following procedure. By Gauss’ theorem 

= / + J +A . j s X-'y 1 -'z«-'(h-R)dS (2.27) 
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where S is the bounding surface and R = (x, y, z). Since S is the union of flat surfaces 

Su 


where 




1 

I + J + K 


X>s. 


(2.28) 


F s , = n x F(I+UJ,K) + n y F(I,J + l,K) + n,F(I,J,K + l) (2.29) 

and 

F(I,J,K)= [ x I - 1 y J - 1 z K ~ 1 dS (2.30) 

J Si 

Each Si is assumed to be a polygon whose perimeter is the union of straight lines 
Tij. Using Stokes’ theorem, F(I, J, K) for fixed i may be evaluated recursively by the 
formula 


where 




1 

I+J+K - 1 


(h-R)F n + Y,E» 

j 


(2.31) 


F n = n x {l-\)F{I-\,J,K) + n y {J -l)F{I,J -l,K) 

+n z {K- \)F{I,J,K-l) 

Ey = v x E{I +\,J,K) + v y E{I,J + \,K) + u z E{I,J,K + \) 

where with t denoting the edge tangent vector, v is the edge normal vector t ® n, and 
E(I,J,K) is defined by 

E(I, J,K) = I x I ~ 1 y J ~ 1 z K ~ 1 dl. (2.32) 

Jt, } 

Using simple one dimensional integration formulas E(I,J,K) may be evaluated re- 
cursively by the formula 


E(I,J,K) = —L—^ ^ + D,] (2.33) 

where 

= (I-l)( x E(I-l,J,K) + (J-l)<; y E(I,J-l,K) 

+(K -l)CzE(I,J,I< -l) 

D t = t x D{I + l,J,K) + t y D{I,J +l,K) + t z D{I,J,I< + 1) 
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where (f = (t <g> R) <8> i (constant along T tJ ), and D(I,J,I<) = x I ~ 1 y J - 1 z K ~ 1 |*, where 
1 and 2 represent the initial and final points of T tJ . Thus, the original integrals (2.26) 
defined over a complicated volume can be systematically reduced to point evaluations 
at vertices of the bounding surface. The surface moments arising out of the surface 
integrals in Eqn. (2.17) can be computed starting with Eqn. (2.30). Note that the 
location of the centroid in each D-region can be computed from the zero and first 
order moments. Further details on the operator generation including an example of 
operator calculations are given in Appendix B. 

Identifying D-regions 

There remains, of course, the problem of identifying D-regions and their bounding 
surfaces. This is done in three stages. 

First, for each panel, a list of grid boxes containing any part of the panel is con- 
structed. Because the grid is rectangular and hierarchical in nature it is relatively 
easy to isolate the subset of boxes which are located within a neighborhood of a given 
panel. Moreover, because the boxes are rectangular and the panels are divided into 
flat triangles it is straightforward to determine if boxes in a neighborhood of a panel 
in fact contain any part of the given panel. This list is then inverted to find all the 
panels intersecting a given boundary box. 

In the second stage a list of equivalence classes of panel sides for each boundary 
box is constructed. A panel side is either the upper or lower surface of a panel. An 
equivalence class consists of all panel sides which are connected to each other through 
panel edges. A panel side is connected to another panel side if the two panels share 
a common edge that is partially or wholly contained within the given boundary box 
and if there is no intervening panel also connected to that edge. 

In the third stage separate connected regions of the boundary box are identified. 
This is done by choosing points on different panel side equivalence classes and then 
joining them with straight lines. The set of panels cutting these straight lines is 
examined and the panel side equivalence classes of panels responsible for successive 
cuts are identified as members of a new equivalence class of panel sides which bound 
a connected region. P olygonal subsets of a face of the boundary box are included in 
such an equivalence class whenever a panel side is discovered to intersect the face. 
This algorithm determines connected regions within a boundary box. However it is 
also necessary to determine which such regions are connected to regions in adjacent 
boxes. This is because of the necessity of maintaining continuity of element trial 
functions across box faces and edges. For this purpose a list of which panels in each 
boundary box region intersect box faces is stored. These intersections are compared 
with those in an adjacent box and connections between regions are established. The 

parameters at common nodes of connected regions are then identified. 

2.3.5 Grid Interfaces 

In the finite element method, conservation of mass results if the element trial functions 
are continuous from box to box. This property can be retained in the presence of 
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grid refinement by introducing pseudo-unknowns. By definition a pseudo-unknown is 
any unknown located at a node on the boundary of some element but not at a corner 
of this element. This can occur only at a coarse to fine grid interface. In the two 
dimensional case, pictured in Figure 2.5, $1 is a pseudo-unknown whose parents are 
$ 2 and $ 3 . In the situation pictured in Figure 2.4, is a pseudo-unknown whose 
parents are $ p and 'Pp. 

In order to maintain continuity of the element trial functions across element bound- 
aries, in Figure 2.5 must be the average of its parents, that is 

$1 = ^(* 2 + * 3 ) (2.34) 

In three dimensions, pseudo- unknowns can occur at the midpoints of element edges 
or the centers of element faces. For a pseudo-unknown $1 at the center of an element 
face with four parents $ 2 i$ 3 j$ 4 i an d $ 5 , 

$1 = I($ 2 + $ 3 + $ 4 + $ 5 ). (2.35) 

Thus, pseudo-unknowns are not true degrees of freedom and could be eliminated at 
the outset from the element stiffness matrices through Equation (2.35). However, 
this would result in loss of uniformity in these matrices, many special cases, and loss 
of vectorization. Hence these unknowns are treated as degrees of freedom when the 
element stiffness matrices are generated. In the process of evaluating the discrete 
operator L, pseudo-unknowns are first assigned values by averaging their parent un- 
knowns. Residuals of the governing equations are produced at these unknowns but 
are then distributed to the residuals for their parents. This process of distributing 
residuals to parents is justified by Eqn. (2.35) and a straightforward application of 
the chain rule 

dJ _ dJ dJ_d$± = &/_ 1 dJ (9 36 ) 

d$ 2 ~ d$2 d$2 4 d$i 

Thus, the component of the residual of the discrete version of Eqn. (2.25) calculated 
for should be equally distributed to the residuals for the four parent unknowns. 
This technique has the advantage that every element stiffness matrix produces con- 
tributions only to the 8 corner unknowns of its box element, thereby simplifying the 
generation of the stiffness matrices and enhancing vectorization. Vectorizing over 
large blocks of similar elements can be done using an outer loop over the eight corner 
unknowns and an inner loop over the elements in the block. 


2.3.6 Modifications to the Bateman Principle 

To achieve a stable numerical formulation, the treatment of Dirichlet boundary con- 
ditions and wake surfaces must be modified. In addition, the natural Neumann condi- 
tion must be modified to account for boundary curvature, since the solution is often 
sensitive to this quantity and the boundary is discretized using flat panels. These 
modifications are described below. 
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Figure 2.5: Pseudo-Unknown in Two Dimensions. 

The introduction of the last surface integral in the variational principle (2.17) 
enforces a Dirichlet condition on dfi 3 . Equation (2.17) can then be used to calculate 
the element stiffness matrices in a finite element formulation. It turns out that the 
resultant discrete problem is somewhat unstable. In certain instances one can show 
that the boundary unknowns 'P actually satisfy a discrete Helmholtz equation and 
an oscillatory solution is, in fact, obtained. This phenomenon is probably related to 
the fact that J is no longer maximized in subsonic flow with Dirichlet data. This 
suggests a remedy which has been implemented and which has been found to be very 
reliable numerically. The last integral in Eqn. (2.17) is replaced by 

/•J^*-** (2 - 37) 

where A / is the minimum diameter of the box containing the trial function. A similar 
term may be added to the second integral. 

All surfaces are represented by flat panels. Resultant discontinuities in slope from 
panel to panel will be reflected in the solution as the grid is refined. In most cases, 
this effect is spurious, since the surface slope discontinuities are artifacts of the panel 
description of the surface. To eliminate this problem, a curved surface is simulated 
by adding to the variation of Eqn. (2.17) a surface integral 

dJ = dJ+ f />V$ • (n - h m )d$dS (2.38) 

where h* is a polynomial interpolation of n and d$ denotes the variation of $. The 
endpoints for the polynomial interpolation of the normal are user controlled to allow 
discontinuities in slope where they are actually present. 

2.3.7 Dissipation 

First order upwinding of the density is used to produce the artificial viscosity required 
when supersonic flow is present [22, 25]. Such upwinding is given by replacing p in 
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the full potential equation with 


p = p- nV ■ A.p (2.39) 

where V is the normalized local velocity and A is an upwind undivided difference. 
In Eqn. (2.39) p is the switching function given by 


p = max(l -M c 2 /A/ 2 ,0) (2.40) 

where M is the local Mach number and M c is the cut-off Mach number assigned the 
value Ml = 0.95 chosen to introduce dissipation just below Mach 1.0. 

An alternative to upwinding the density is to use flux biasing i.e. upwind the flux 
pq where q = || V H 2 • Flux biasing may be expressed in a form similar to Eqn. (2.39) 
by writing 

P=- {(pq) -VA. (pq)) (2.41) 

q V ' 

where 


{ pq for M > 1 

p*q* for M < 1 


(2.42) 


Here, p*q * is the value of pq at sonic flow conditions. 

For either form of dissipation, the upwinding is done across the face of a box with 
a precomputed stencil. A density or flux is chosen for each of the six faces of an 
element when the operators are generated. For a uniform grid with no boundaries, 
each box has a single box adjacent to it across each of its six faces. In case of grid 
refinement, there are two other cases. If the adjacent box is refined, the density used 
for upwinding is obtained by averaging the densities for the four adjacent refined 
elements. If the adjacent box is coarser, then three densities are averaged. In two 
dimensions, the possibilities for upwinding to the left across an edge are illustrated 
in Figure 2.6. 

Upwinded density p is defined by 


P = P + P max( — V • n„ 0)5, (10 Cij(pij - p) (2.43) 

1=1 j 

where i runs over the 6 faces of the given box, j runs over the densities averaged to 
obtain the density upwinded to, Cij is the coefficient for each of the four densities 
contributing to the density upwinded to, V is the normalized velocity at the centroid 
of the given element, n, is the outward pointing normal to face i of the element, 
and 5,(V) is a cubic blending function to make the upwinding differentiable. This 
upwinding is first order accurate, introducing an error comparable to replacing the 
density p with a piecewise constant approximation in each element. In the case of 
D-regions, special operators must be constructed based on local information about 
box adjacency. This information is extracted from the oct-tree (see Appendix A) and 
D-region lists in a preprocessing step. 
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Face Adjacent Box at Same Level 


• Phj 



Face Adjacent Box is Less Refined Face Adjacent Box is More Refined 


Figure 2.6: Upwinding Stencils in Two Dimensions for Negative x Edge. 
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2.3.8 Accuracy of Discretization 

By keeping the trial functions parameterized by values at the corners of similar boxes, 
uniformity of the basis [58] is guaranteed. Thus, in the limit, standard approximation 
theory and finite element error estimates hold. The asymptotic convergence of the 
method has been verified with uniform grids for the case of a sphere in incompressible 
flow where an analytic solution is available [36]. Sections 3.1 and 3.2 contain compu- 
tational examples that demonstrate the method’s accuracy for locally refined grids. 
There is no theoretical guarantee of good conditioning. However, experience to date 
with the code has not uncovered any conditioning problems for the cases in the range 
from 8000 to 600000 grid points and refined grids with relative levels of 10 below the 
global grid level. 
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2.4 SOLUTION ALGORITHM 

2.4.1 Linear Solution Algorithm 

The solution technique used in TRAN AIR was designed for nonlinear problems. How- 
ever, it is useful to describe the algorithm applied to the special case of a linear 
boundary value problem. 

Discrete System 

The generic linear potential equation 


V • (pV$) = / (2.44) 

is considered with p = p(x, y, z) assumed to be given and strictly positive. The 
boundary conditions are those described earlier in Section 2.1. 

In order to enforce the far field condition given by Eqn. (2.3), source unknowns Q 
are introduced on the global grid and replace unknowns $ there. Since Q is known 
to be zero on the boundary of the global grid, the residual does not need to be 
computed there. The extrapolated values in boundary boxes are denoted by '5, all 
other variables on the refined grid are denoted by $, and the doublet parameters on 
wakes are denoted by p. The finite element operator described in Section 2.3.4 will be 
denoted by L. It is defined over the whole grid except on the boundary of the global 
grid and is evaluated by multiplying the element stiffness matrices by the vector of 
unknowns. 

Thus it is necessary to solve the linear system of equations 


/ T~ l Q \ 
$ 

V n 


= /• 


(2.45) 


Preconditioned System 

Since the system (2.45) (depending on boundary conditions) is non-symmetric and 
non-definite the GMRES method of Saad and Schultz [53] is chosen as the basic 
iterative solver. 

The operator, T -1 , used to obtain the potential from the sources acts as an effective 
right preconditioner for the global grid points. It is also necessary to use a left 
preconditioner, to approximate the problem near the internal boundaries. The left 
preconditioner, N, is taken to be the global stiffness matrix restricted to a reduced set 
of unknowns. The reduced set is defined to consist of all unknowns located at corners 
of boundary boxes, refined boxes, or boxes with total pressure or temperature different 
form free stream values, and the doublet parameters p. The stagnation unknowns 
(those that are located in the interior of the configuration) are not included in the 
reduced set. The reduced set is closed by closure unknowns which are outside the 
reduced set but in the stiffness matrix stencil of some unknown in the reduced set. 
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The boundary condition at closure unknowns is an approximation to the far field 
condition for the original problem, i.e., <f> — 0. 

Note that there is some overlap between the Q unknowns on the global grid precon- 
ditioned by T~ l and those in the reduced set preconditioned by N~ x . For unknowns 
Q at global grid points in the reduced set, an additional preconditioner T must be 
applied on the left to make the equation dimensionally correct. 

Hence, in all there are five classes of unknowns in a given flow problem. They are: 

• Q (1) , the source unknowns at global grid points which are not in the reduced set 
and not in stagnation regions; 

• the source unknowns at global grid points in the reduced set or in stagnation 
regions; 

• $, the values of the velocity potential at points on locally refined grids; 

• 'P, the values of velocity potential in the boundary basis functions; 

• fi, the doublet strengths at leading edges of wake networks. 

The preconditioned equation can then be written as 


TN- 1 (f-LT- 1 X) = 0 


where 


X = 


The operators T and N are defined as: 


/ 


T = 


T (i)(i) 

r (2)(l) 
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( 2 . 46 ) 


( 2 . 47 ) 


( 2 . 48 ) 


( 2 . 49 ) 


To achieve invariance with respect to units, the source unknowns Q must be scaled 
relative to the potential unknowns (the scale factor has the dimension of the inverse 
length squared). After scaling, the GMRES convergence history is independent of 
the physical units used to define the problem. 
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Preconditioned Residual 

The calculation of the preconditioned residual R (the function evaluation subroutine 
for GMRES) is now described. 

For unknowns (those on the global grid but not in the interior of the reduced 
set) 


R(Q m ) = !-L 


(2.50) 


For unknowns Q' 2 ' (those on the global grid and in the interior of the reduced set 
or located at global grid points in stagnation regions) 



In Eqn. (2.51), special account must be taken of unknowns Q located in stagnation 
regions, such as the interior of wings and fuselages. For these unknowns, it is impor- 
tant to realize that N~ x is just the identity, / = 0, and L$ = <j>. Thus T is applied to 
the global grid unknowns in the reduced set, closure point unknowns, and stagnation 
unknowns. But the input for T at stagnation unknowns comes from a different pro- 
cess than that used for the other two classes of Q unknowns. Another special class 
of Q unknowns in Eqn. (2.51) are those at closure points. N~ l does apply to the 
residual at these points producing input values for T to give residual values for points 
in the reduced set. But for these closure unknowns, the residual is actually given by 
Eqn. (2.50). 

For unknowns $ not located on the global grid and for all unknowns T and jj. 


( JW 
*(*) 
v m 



(2.52) 


For $ unknowns located at points not in the global grid but in stagnation regions, 
the residual is given by /?($) = L$ = <f>. 


Preconditioners 

The operator T _1 represents the discrete Green’s function and is defined over the 
uniform global grid. Construction and application of the discrete Green’s function 
(Poisson Solver) is extremely fast since one can take advantage of the constant grid 
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spacing and use discrete Fourier transforms. More details on this preconditioner are 
give in Appendix D. 

The left preconditioner matrix N is sparse and it is feasible to do a direct sparse 
incomplete factorization of N. This works for the following reasons. First, a drop tol- 
erance can be introduced into the sparse elimination process allowing small elements 
in the decomposition to be dropped as they are generated. This has a cascading effect 
and reduces fill dramatically [43]. Second, a grid based nested dissection ordering can 
be generated which reduces fill during elimination and therefore the total amount 
of work. In most cases the drop tolerance is the most effective strategy. Figure 2.7 
shows the reduced set and a possible first dissector for a grid for a sphere case. More 
details on the sparse solver preconditioner are given in Appendix E. 



Figure 2.7: Reduced Set and Possible First Dissector. 


2.4.2 .Nonlinear Solution Algorithm 

When the problem is nonlinear it is necessary to use the Newton method. Each step 
of the Newton method requires the solution of a linear problem of the type discussed 
in Section 2.4.1. 

Newton Method 

Consider the nonlinear system of equations 

F{ x) = 0. (2.53) 

Given an initial approximate solution x°, for n = 0, 1,2, ... until the residual is suffi- 
ciently small, set 
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x n+1 = x n + X(6x n+1 ) 


(2.54) 


where 6x n+1 is the solution of the linear system 

T in (6x n+1 ) = —F(x n ) (2.55) 

and A is a step length to be determined. Here ~F x n is the Jacobian for F linearized 
about x\ This linear operator can be defined by giving its action on any vector y 


The step length A is selected so that 


(2.56) 


||F(x" tl )|| < ||F(x’)|| (2.57) 

in some appropriate norm. 

The GMRES algorithm can be used to solve Eqn. (2.55). This algorithm requires 
only the ability to calculate the action of the linear operator 7 X on any vector y. 
Equation (2.56) can be used to approximate this action 


TM = + e «l ~ 


(2.58) 


where e is small in some appropriate sense. Thus, the linear problem, Eqn. (2.55), 
can be solved without ever explicitly generating the Jacobian for the full nonlinear 
problem. 

To control the cost of the method Eqn. (2.55) is solved only approximately with 
GMRES, i.e., 5x n+1 satisfies 


||7* r »(6x ,l+1 ) + F(x")|| < T). 

This makes the method an inexact Newton method [59]. If t} is constant, the method 
converges linearly. If rj goes to 0 as convergence takes place, the convergence is 
superlinear. More details can be found in [54]. 

Preconditioning GMRES 

Preconditioning Eqn. (2.55) is identical to that used for linear systems and given in 
Eqn. (2.46). If / is replaced by — F(x n ), L by 7V», and T~ l X by x, Eqn. (2.46) is the 
same as Eqn. (2.55) preconditioned on the left by TiV -1 . For convenience, the finite 
difference formula (2.58) is applied to TN -1 F rather than F. The matrix forms of 
T, N, and T _1 are given in Section 2.4.1. 
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The matrix N is an approximation to the Jacobian for F about the current solution 
restricted to a reduced set as described in Section 2.4.1 above. The reduced set now 
also includes all elements where upwinding is used. 

The matrix N is generated on an element by element basis using the element 
stiffness matrices. The density function p and its derivatives are evaluated at element 
centroids. For unknowns one of whose eight contributing elements has upwinding in 
effect, the row of the matrix N depends on more than just these eight elements. The 
algorithm can be simplified by applying the chain rule to the calculation of a matrix 
entry, 


dF\*)i _ dF($), ^ dF(9)j dp k 

d$j d$j dp k d$j ' 


(2.59) 


where p is given by Eqn. (2.39). The first term on the right is the contribution from 
the subsonic stencil, i.e., the element stiffness matrices. The second term is generated 
using a sparse matrix-matrix multiply. This technique enables vectorization even 
though the upwinding is element dependent. 

The convergence of the inexact Newton method depends on how well the matrix 
N represents the Jacobian of F . If the damping strategies described below are used 
it is usually necessary to compute and invert matrix N infrequently. 


Local Damping of Newton Method 

Newton’s method is rarely globally convergent. Also, its convergence rate is generally 
quadratic only sufficiently close to the solution. The initial iterate is usually taken 
to be <j> = 0, which is not a good approximation to the solution. Thus, Newton’s 
method works well only for moderate to small problems or those with only weak or 
no shocks. For large problems or problems with reasonably strong shocks, Newton’s 
method must be damped to prevent divergence or very slow convergence. 

Various damping strategies have been tested in the present method. One due to 
Bank and Rose [60] for determining the step length A is based on the residual for 
Eqn. (2.1). This strategy is fairly simple to implement and provides adequate local 
damping in many cases. 

Another strategy is to limit A to prevent local Mach numbers greater than some 
prescribed cut off value. This prevents spurious large velocities from causing stagna- 
tion of convergence. In the ONERA M6 wing results reported below, this strategy 
was used with a local Mach number cutoff of \/E. 

However, local damping strategies of this kind are only effective by themselves in 
cases that almost converge anyway. In more difficult transonic cases, a steep shock 
can form in the wrong location early in the iterative process and the Newton method 
can stagnate. In this case, a local method can rarely move the shock more than one 
grid point per iteration, resulting in very slow convergence. This situation seems to 
be due to the fact that the residual is much larger near the shock than elsewhere. 

The shortcomings of the local damping strategies can be seen in the case of the 
ONERA M6 wing at Moo = 0.84 and angle of attack a = 3.06° on a grid having about 
311,000 elements. This case exhibits a strong shock outboard as well as an oblique 
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shock. If Newton’s method is used with an initial iterate <f> = 0 and the Bank-Rose 
strategy for limiting A, the convergence stagnates at the iterate shown in Figure 2.8. 
The final converged solution is shown for reference. If A is further limited to control 
local Mach numbers as described above, the convergence is still very slow. Figure 
2.9 shows the Newton iterate after six and twelve Newton steps. Newton’s method is 
moving the shock toward the correct location very slowly. 

Viscosity Damping 

To improve convergence in the presence of shock waves, a problem dependent dissi- 
pation is used. Here a larger amount of dissipation is introduced during the early 
iterations and it is reduced to appropriate levels as the solution develops. This type 
of damping strategy can be implemented through a continuation process which can 
be based on many types of parameters. 

In the first, and more direct approach (called the viscosity damping strategy ), the 
discrete problem is modified by multiplying the switching function of Eqn. (2.54) by 
a constant factor (1.5 to 3.0) and by reducing the cut-off Mach number during the 
initial steps in the Newton method. This has the effect of increasing the amount 
of artificial viscosity and applying it to a larger part of the flow field. After several 
Newton steps, the problem is modified by reducing the multiplying factor and raising 
the cut-off Mach number. This process is repeated until the desired level of dissipation 
is reached. This continuation process works very well since it has the effect of locating 
the supersonic zone and the shock position fairly early in the process, even though 
the shock is quite smeared. 

When, viscosity damping is used in the case of the ONERA M6 wing, conver- 
gence improves considerably after the initial viscous problems are partially solved. 
A partially converged solution at the second continuation step (Newton step seven) 
is shown in Figure 2.10. Figure 2.11 shows the convergence histories for these runs. 
The residual jumps in this figure correspond to discrete changes in the continuation 
parameter. The drawback of this continuation approach is the high cost of even 
partially solving the viscous problems that are introduced. 

Several other parameters were used as continuation parameters including free 
stream Mach number (M 0 Q ) and the total pressure of the free stream. In both cases, 
the shock location was sensitive to the continuation parameter and convergence was 
poor in certain cases. 

2.4.3 Grid Sequencing 

A strategy that has proven to be very reliable for ensuring convergence for difficult 
transonic problems is grid sequencing. Basically the process involves the following 
steps. A sequence of coarse to fine grids are generated a priori. The solution is found 
on the coarsest grid. The converged solution is interpolated onto the next finer grid 
and the problem is solved on that grid. This is repeated until the solution is obtained 
on the finest grid. A gradual change in viscosity is brought about by the fact that the 
grid cell size in the initial stages (on coarse grids) is larger and thus the dissipation 
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— Converged Solution Standard Newton Iterate 


Figure 2.8: Iterate for Newton’s Method With Residual Damping for ONERA M6 
Wing Case, M ^ = 0.84, a = 3.06°, 91% Span Station. 
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Converged Solution 6th Newton Iterate 

— * — 12th Newton Iterate 


Figure 2.9: Iterates for Newton’s Method With Residual and Local Mach Number 
Damping for ONERA M6 Wing Case, = 0.84, a = 3.06°, 91% Span Station. 
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Converged Solution 6th Newton Iterate 

o 7th Viscous Damping Iterate 


Figure 2.10: Partially Converged Iterate for the Second Continuation Step Using 
Viscosity Damping for ONERA M6 Wing Case, = 0.84, a = 3.06°, 91% Span 
Station. 


38 


RELATIVE RESIDUAL 



■b— Residual Damping 

-x — Residual and Local Mach Number Damping 

■$— Residual, Local Mach Number, and Viscosity Damping 


Figure 2.11: Convergence Histories for Newton’s Method with Various Damping 
Strategies for ONERA M6 Wing Case, M ^ = 0.84, a = 3.06°. 
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is larger. As the grid becomes finer the dissipation is automatically reduced. The 
process of interpolating the solution naturally positions the nonlinear features in the 
solution. It is also possible to employ viscosity damping on any or all the grids in 
the sequence with some simple code modifications. The benefits of this approach are 
more reliable convergence and lower computer cost. 

The results for the ONERA M6 case are presented below. With grid sequencing, 
this case converged rapidly. As discussed in Section 2.4.1, this case did not converge 
when residual and local Mach number damping was used with Newton’s method with 
an initial guess of <f> = 0. Convergence was obtained in two ways. Initially, viscosity 
damping was used and it was found that four continuation steps were required. With 
grid sequencing, this case converged more rapidly and CPU times were proportionally 
reduced. Figure 2.12 compares convergence histories for these three methods. Iter- 
ations in the grid sequencing run are scaled by the approximate size of the problem 
for the early small grids (this scaling corresponds approximately to CPU cost). Grid 
sequencing offers a substantial advantage in both rate of convergence and storage 
requirements. For the grid sequencing run, CPU time was about half of that needed 
for the viscosity damping run. 

Cuts through the three grids used are shown in Figures 2.13 through 2.16. The 
final fine grid is the last of these three grids. The grids had about 19,000, 56,000, and 
311,000 elements respectively. Figure 2.17 shows surface pressures obtained on the 
three grids used in this case. On the coarser grids, the shock is in the right location 
but smeared. 


2.4.4 Solution Adaptive Grids 

The grid sequencing method described in Section 2.4.3 operates on a set of pre- 
constructed grids in which the refinement is governed a priori by the configuration 
surface definition and other specifications (see Section 2.3.3). The solution adaptive 
grid method starts with the coarsest grid in the preconstructed sequence of grids. 
The overall procedure in discretizing the problem on the current grid and solving 
the discrete equations is identical to that described so far (see Sections 2.3 and 2.4). 
However the solution adaptive method differs from the grid sequencing method in two 
regards. The first difference is that in the adaptive method, the next grid is generated 
anew and the local resolution of the new grid is determined by a posteriori computed 
local error estimates and user inputs (rather than being taken from the pre deter- 
mined sequence). The second difference is that in creating the new grid single-level 
local refinement as well as derefinement may be used. In refining, a rectangular box 
element is replaced by eight smaller similar elements, whereas, in derefining, eight 
sibling elements are coalesced to form a larger similar element. 

The goal of the adaptive grid method is to obtain a final grid with a (specified) 
target number of elements, N, and a numerical solution on that grid that is nearly 
as accurate as the best solution one could obtain using N elements in any grid. To 
achieve this goal five sequential steps are carried out in creating each new grid. These 
steps consist of estimating local errors on the previous grid, computing local error 
predictors, applying a priori grid refinement controls, applying a grid refinement 
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RELATIVE RESIDUAL 



□ Standard Newton Damping 
— Viscosity Damping 
-s — Grid Sequencing 


Figure 2.12: Convergence Histories for Newton’s Method, Newton’s Method with Vis- 
cosity Damping, and Grid Sequencing for ONERA M6 Wing Case, M = 0.84, a = 
3.06°. 
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Coarse Grid 



Medium Grid 


Figure 2.13: Cuts Through The Coarse and Medium Grids Generated by Grid Se- 
quencing for ONERA M6 Wing at 91% Span, = 0.84, a = 3.06°. 
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Fine Grid 


Figure 2.14: Cut Through the Fine Grid Generated by Grid Sequencing for ONERA 
M6 Wing at 91% Span, M ^ = 0.84, a = 3.06°. 

strategy, and constructing the new grid. 

Use of the method components in these steps is novel in the present context, but 
the basic ideas behind them have been proposed before by other researchers [61]-[68]. 
The performance of the present solution adaptive grid method depends somewhat 
strongly on the characteristics of individual applications, and the specific method 
components employed. The specific method components were chosen after substantial 
but nonexhaustive, testing and analysis. 

Computing Local Error Estimates 

Local differences of velocity components are used as estimates of the error for each 
rectangular 1 element in the grid. The error estimate for an element consists of 

erreat = ^iqaX r {m^ax{(AuI'’ J ) 2 + (Av£’ J ) 2 + (At> 3 ’ J ) 2 }^}, (2.60) 

where, for the rth solution region contained in the element, Av-' j denotes the dif- 
ference across the element’s j th face of the region centroid values of the *th velocity 
component. The outer maximum in Eqn. (2.60) is taken over all regions contained 

It should be recognized that the finite element method is applied on the regions over which the 
element trial functions are defined. These regions are part of the rectangular box elements. The 
grid refinement process refines box elements, and new regions are determined for the subdivided box 
elements. The grid refinement process does not refine individual regions. 
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Medium Grid 

Figure 2.15: Cuts Through the Coarse and Medium Grids Generated by Grid Se- 
quencing for ONERA M6 Wing at the Plane of Symmetry, M «> = 0.84, a = 3.06°. 
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Fine Grid 


Figure 2.16: Cut Through the Fine Grid Generated by Grid Sequencing for ONERA 
M6 Wing at the Plane of Symmetry, = 0.84, a = 3.06°. 

in the element. The inner maximum is taken over all element faces connecting to 
a region not contained in a larger element. Figure 2.18 illustrates in the case of a 
two dimensional airfoil the directions in which velocity components are differenced to 
compute error estimates for five elements, labeled A— E, each of which contains only 
one solution region. 

This error measure provides a natural way to detect flow features having different 
length scales near different configuration components and does not lead to excessive 
grid in the far field. 

Computing Local Error Predictors 

A simple local smoothing algorithm is used to form error predictors from the local 
error estimates. In this algorithm, nodal values are first set equal to the largest of 
the error indicators for adjacent elements and then interpolated at element centroids 
to form the predictors. This algorithm implicitly predicts the need for grid refine- 
ment near elements where large errors have been detected and spreads the effect of 
estimated errors by one or two elements, thus preventing holes in subsequent grids. 

Applying A Priori Grid Refinement Controls 

For accurate and efficient analyses of complex configurations, it is important that 
one is permitted to communicate regions of greatest and least interest to a solution 
adaptive grid code. In many cases this is essential so that flow features of interest can 
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Figure 2.18: Directions for Velocity Component Differences in Error Indicators for 
Elements A-E 


47 


be resolved most accurately with a given target number of elements. The reasons for 
this are as follows. Unless otherwise instructed, the solution adaptive grid method 
gives equal weight to all regions with equal estimated errors in considering local grid 
refinement. Such regions could include those about wing tips, leading edges of many 
components in a configuration, irregularities in geometry, and wake regions, not all 
of which may be of equal importance to a person using the code. It can also happen 
that one flow feature that is easily detectable may dominate another flow feature that 
is latent (a feature that cannot be detected until sufficiently fine grid is present). For 
a given target number of elements, one often can significantly enhance the detection 
and resolution of latent flow features by restricting or de-emphasizing grid refinement 
in regions with dominant features. 

The mechanism for exercising such a priori grid refinement controls in the solution 
adaptive grid method is the use of the same hexahedral shaped special boxes of interest 
(LBOs) used in constructing the initial fine grid (see Section 2.3). With each LBO, 
one specifies trilinearly varying minimum and maximum local grid sizes allowed in the 
LBO and a weight used to scale the corresponding local error predictors. The results 
of applying these controls in the method are an element refinement/derefinement 
eligibility list, a list of elements whose sizes are above specified maximum values, and 
a list of scaled error predictors ordered by size. 

Applying Grid Refinement Strategy 

In this step the elements are either marked for refinement, derefinement, or to be 
retained unchanged. Of the elements eligible for refinement, those with the largest 
scaled error predictors are marked for subdivision, with elements having grid sizes 
above specified maximum values taking precedence. Of the elements eligible for dere- 
finement, those with the smallest scaled predictors are marked for derefinement (if all 
eight sibling elements are so eligible). The decisions regarding how many elements to 
refine and how many to derefine depend on a grid refinement strategy. 

In examining strategies for deciding how many elements of each type to mark, 
two principles have proven useful. First, direct control should be exercised over the 
rates at which numbers of elements in successive grids increase. This excludes, for 
example, a prevalent strategy that refines/ derefines solely on the basis of cut-off values 
which are proportional to the current mean local error indicator. Direct control is 
important because problem size can increase very rapidly with grid refinement in 
three dimensions. Second, for early and intermediate grids, grid refinement should be 
limited in regions where dominant flow features have been detected and be forced to 
occur in other regions. Failure to adhere to this principle can allow some flow features 
(e.g., leading edge expansions) to attract all available grid before other important 
features (e.g., shocks) develop. 

A simple and flexible strategy following these principles is incorporated in the 
present method. It consists of 

• refining and derefining fixed percentages of elements for most coarse and inter- 
mediate grids, 
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• attempting to more equally distribute local errors without significantly changing 
the number of elements in an intermediate grid, and 

• only refining on the last grid. 

More specifically (using a particular choice of input parameters for the method), with 
given intermediate and final target numbers of elements Nj and Np, respectively, and 
N denoting the number of elements in the current grid such that 1 V/ < Np < 4 Ni, 
and N < N[ 

if N < AN/, 20% of the elements are marked for refinement and up to 40% for 
derefinement; 

if ANj < N < .9N[, only refinement is used to increase the number of elements to 
about Np, and 

if N is approximately equal to Ni, 2% of the elements are refined and up to 20% 
are derefined, and the next (final) grid adaptation consists of (only) refining enough 
elements so that the final grid has about Np elements. 

It is noted that the implementation of this and related grid refinement strategies 
consists of “solution adaptive grid cycles”, where in each cycle, input consists of a 
target number of elements and various control parameters, and one or more adaptive 
grids are constructed. In the specific strategy described above, three cycles are used 
with target numbers of elements equal to iV/, jV/ and Np , respectively. In the second 
cycle only one adaptive grid is constructed. 

Constructing a New Grid 

Using a list of marked elements and a grid legalization constraint, a grid is constructed 
by building a new oct-tree structure. The legalization constraint, (see Appendix A), 
requires that additional elements be marked for refinement, if necessary, in order to 
prevent face-neighbor and edge-neighbor elements in the resulting grid from differing 
by more than one refinement level. In the newly constructed grid, the uniform global 
grid is expanded on the inflow or outer boundaries whenever a global grid box in the 
previous grid on the respective boundary is marked for refinement. Since linear flow 
assumptions are used on all inflow and outer boundary grid boxes, expansion of the 
grid (and problem domain) occurs whenever significant nonlinear effects are present 
in the flow near these boundaries. 

Figure 2.19 illustrates the types of grids created in an application of the solution 
adaptive grid method. Pictured there are cuts of the initial grid and the second and 
fourth adaptive grids in a run. 
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Figure 2.19: Initial Grid and two Grids Created in an Application of the Adaptive 
Method. 
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2.5 POSTPROCESSING 

A finite element postprocessing capability has been implemented to smooth out irreg- 
ularities in surface pressure distributions [69, 70]. The irregularities in the pressure 
distribution are illustrated in Figure 2.20 and arise due to the fact that the trilinear 
functions used to approximate the potential lead to essentially constant velocity in a 
given box. If more that one panel corner point is located in a given grid box then the 
pressure at these points (calculated from the velocity using the isentropic formula) 
also appears to be essentially constant. 

To eliminate this apparent anomaly, a velocity is computed for each unknown 
$ or 'P, located at a grid point using the following procedure. 

• All regions influenced by the unknown are found. 

• At the spatial location of the unknown the velocity basis functions of these 
regions are evaluated. 

• V is given by the average of these velocity vectors. 

To evaluate the velocity at any point in space, the region containing the point is 
found and the velocity components at the eight corner unknowns of the region are 
trilinearly interpolated. 

For the sphere, the surface pressure is shown with and without the postprocessing 
step outlined above in Figure 2.20 for a linear flow case and a transonic case . The 
effect of post processing is evident. 
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Figure 2.20: Solutions With and Without Post Processing for a Sphere in Linear 
Flow, M 0 o = 0.0, and Transonic Flow, = 0.7. 
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2.6 SUPERSONIC FREE STREAM FORMULA- 
TION 

The grid generation, discretization, and dissipation used to solve problems in super- 
sonic free stream are essentially identical to that used in the problems in subsonic 
free stream. However, due to the difference in the character of the full potential equa- 
tion in supersonic flow there are certain differences in the way the far field boundary 
conditions are imposed and the discrete equations are solved. 

2.6.1 Far Field Treatment 

The same governing equation is applicable in supersonic free stream flow. However the 
far field behavior of the flow is different from that in subsonic free stream flow. The 
source parameterization (Section 2.3.2) cannot be employed in the case of supersonic 
free stream flow. The Poisson solver constructed for the Prandtl Glauert equation 
(Appendix D) is no longer valid. Instead other types of boundary conditions are 
applied at the outer boundary of the computational domain. 

Due to the hyperbolic nature of the flow initial conditions are required at the 
inflow boundaries (typically the upstream boundary). Since in supersonic flow the 
configuration has no upstream influence, the appropriate boundary condition is that 
the perturbation potential there be zero. In implementing this boundary conditions 
the perturbation is forced to be zero at two upstream planes of the global grid. 

At the outflow boundaries of the computational domain no boundary conditions 
are required in principle, because the solution can be obtained in a marching process. 
However, since the present method uses a sparse solver on a reduced set that includes 
all the unknowns in the field it is essential to impose some boundary conditions that 
do not feed their influence upstream. In this case a <f> xx = 0 boundary condition is 
imposed at the downstream boundary. 

On the side boundaries the initial conditions at the upstream boundary and the 
<)> xx boundary conditions imply that the perturbation potential at the side boundaries 
also be zero. 

It is noted that the imposition of these conditions on the outer boundary of the grid 
may cause shock reflections which could influence downstream portions of the flow 
field, particularly at low supersonic free stream Mach numbers when the Mach cone 
angles are large. A better approach would be to assume supersonic Unear flow outside 
the computational grid and impose a conical flow condition at the outer boundary. 
This has not yet been implemented. 

2.6.2 Solution of Discrete Equations 

As mentioned earlier, all equations are included in the sparse matrix preconditioner, 
including the equations corresponding to the global grid points away from regions of 
local grid refinement and away from the configuration surface. To take advantage of 
the hyperbolic nature of the flow, the equations in the sparse solver are ordered by 
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increasing x coordinate value, rather than by nested dissection. The system is solved 
by the nonlinear GMRES procedure exactly as is done in the subsonic free stream 
case. 

For the supersonic free stream case, the convergence of the linear problem in 
the Newton method is essentially the same as in the subsonic free stream cases. 
The non-linear Newton iterations converge better with grid sequencing and viscosity 
damping. However, on occasion supersonic free stream cases have been observed to 
“stagnate” for a few cycles before converging further. In addition, when using solution 
adaptive gridding, it has been observed that more robust convergence behavior occurs 
if viscosity damping is used on each successive grid. This requires more iterations 
on the finer grids before turning off the extra viscosity, but provides more reliable 
convergence. 

The solution adaptive grid features of TRANAIR are the same in supersonic free 
stream flow as in subsonic free stream flows. Normal shocks and expansion regions 
are easily detected and grid is generated to resolve gradients. When the shocks are 
oblique, it takes more cycles of solution adaptivity to begin to detect their presence. 
Figures 2.21 through 2.24 illustrate a sequence of grids generated about a sphere-cone 
configuration. 



Figure 2.21: Solution Adaptive Grid (No. 1) for the Supersonic Cone 

The early solution adaptive grid refinements concentrate on the gradients in the 
expansion region near the upstream stagnation point on the spherical face. Three 
cycles of solution adaptation were required to get errors in the stagnation region 
reduced sufficiently so that the bow shock was well-recognized. After five cycles of 
grid refinement the bow shock is clearly developed up to the point where it becomes 
quite oblique. At that point, the grid is too coarse to sufficiently resolve the oblique 
shock and it diffuses badly. The relative distribution of the computed local error 
estimates indicates that the next cycles of solution adaptive refinement will better 
resolve the oblique portions of the bow shock, but it is clear that it is desirable to 
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Figure 2.22: Solution Adaptive Grid (No. 2) for the Supersonic Cone 






2.7 PROGRAMMING CONSIDERATIONS 

The numerical method incorporated in TRAN AIR has many algorithms. In imple- 
menting these algorithms a significant amount of effort was spent ensuring that these 
algorithms make efficient use of vector supercomputer hardware features. Since no 
assumptions were made regarding availability of machines with extremely large cen- 
tral memory, the code was designed to use out-of-core storage, e.g., the Cray SSD. 
(TRANAIR also can be easily modified for in-core applications.) This also required 
that special attention be paid to memory management and input/output issues. Data 
structures are of paramount importance to any application code and TRANAIR is 
no exception. Due to the unstructured nature of the grid and large amount of data 
to be handled, TRANAIR uses some unique data structures. Finally, it is also worth 
noting that the code has been developed by a team of people. To facilitate team work 
and minimize maintenance problems, “black box” (modular) coding was used where 
possible, leading to a large collection of library routines performing various standard 
functions. 

In the following these issues are discussed in some detail. No attempt is made to 
provide a complete list of subroutines nor to discuss any specific algorithm at great 
length. The purpose of this section is to set forth ideas that have gone into building 
the modules that make up the TRANAIR code. 

2.7.1 Memory Management 

To maximize the size of the problem that can be solved with a given amount of 
memory it is imperative that the available central memory be used efficiently. This 
issue becomes especially important when many programmers are involved in coding 
different modules and need to access the same memory locations. 

To resolve this issue, a self contained memory management system was developed. 
Most of the memory space used in the code is contained in a single large one di- 
mensional scratch array. This array is divided into smaller arrays as needed in any 
subroutine. The remaining portion of the big array is passed down through the call- 
ing sequence of any subroutine needing further scratch space. The latter routine can 
then further subdivide the array as needed. The subdivision is hierarchical and is 
implemented using several FORTRAN subroutines. When storage for an array is re- 
quested, an identifier and the length of the array are supplied. A pointer to the array 
is returned. This allows reference to the array even after “garbage collection” (defined 
below). Array storage is freed when no longer needed. When storage for an array is 
requested, if possible, a consecutive block of storage is found and allocated. However, 
if storage for many arrays have been allocated and de-allocated the available storage 
may be fragmented with no sufficiently long consecutive block of storage in the big 
scratch array. In this case, storage in the scratch array is garbage collected, i.e., all 
allocated array storage is moved to the front of the scratch array (the pointers are 
changed) leaving a contiguous block of available storage at the back. If this block is 
still not large enough, the program will abort. The simple remedy then is to increase 
the dimension of the main scratch array. 


57 



2.7.2 Input/Output 

TRANAIR uses a centralized I/O system for temporary files. The temporary files 
are typically used to store data so that central memory can be freed for some other 
purpose. The data stored on such files includes that for the oct-tree data structure, 
the boundary operators, the GMRES search directions, and the decomposition of the 
Jacobian matrix used as a preconditioner. These files are made to reside on the SSD, 
the disk, or in central memory if available. For most purposes these files are treated 
as temporary files that are generally lost at the end of program execution. 

Data I/O is carried out through special routines. Some of the bigger datasets 
are written using unblocked FORTRAN I/O. The datasets are accessed using unit 
numbers which are determined and stored within the program. When the dataset 
under question has outlived its utility the unit can be closed, thus freeing it for other 
use as needed. This process is automatic. 

A good example of I/O usage is the residual computation procedure. The residual 
calculation requires the potential at unknown locations (nodes) and density at the 
centroids of regions. In addition, quantities such as the velocity, switching function, 
etc. are also required at region centroids. If all these quantities could be held in core 
at one time the computation of residuals would indeed be extremely fast. However, 
since the available central memory on many Cray computers is small, some of the data 
is blocked and stored on a mass storage device and computations are performed one 
block at a time. The field quantities (those defined at the nodes) are held in core as 
fields and the region based quantities are blocked and brought into central memory as 
the computations proceed. The I/O required in these operations is performed using 
the central I/O system. 

In the case of subsonic flow, two passes through the “blocks” are necessary to 
compute residuals. First, with the array of potential values at unknown locations 
stored in core, velocity operators and corner unknown indices are brought into core 
a block at a time. For each block, velocity components at element centroids and 
hence densities are computed and stored, before considering the next block. This 
step easily can be vectorized with vector length equal to the length of the block. 
Second, with arrays of potential and residual values in core, divergence operators, 
densities at centroids and corner unknown indices are brought into core a block at a 
time and used to scatter contributions to residuals at the corners. 

Other examples of blocking are found in generating and decomposing the sparse 
solver matrix (see Appendix E). 

2.7.3 Vectorization Issues 

The locally refined structure of the computational grid and the need to achieve good 
performance on vector machines makes many aspects of the coding more complex. 
The fundamental change from the basic structure of a uniform Cartesian grid code 
[39] and many logically rectangular body fitted grid codes is that many of the opera- 
tions involve indexed (indirectly addressed) arrays, i.e., in many instances, instead of 
directly addressed vector operations such as 
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X(I) - X(I) + C * Y(I), 
vector operations are required of the form 
X(I) - X(I) + C * Y(IND(I)) 
or 

XdND(I)) « X(IND(D) + C * Y(I) . 

With recent Cray compilers it is possible to vectorize loops with such operations 
when no vector dependencies exist. Through use of careful coding to avoid the need 
for many double-indexed arrays, e.g., arrays of the form Y(IND1(IND2(I))), such 
operations in TRANAIR have been made to execute at high rates (typically at a third 
to a half of the peak rate possible with direct addressed arrays). Much use of this 
is made in the sparse matrix decomposition and forward and backward substitution 
phases (see Appendix E). 

The residual computation also uses indexed operations. As described in Section 2.4 
and Appendix B, finite element operators and velocity operators are computed for 
each element. These operators depend only on the geometry of the element. When 
the element is in a box not cut by a boundary, only the grid refinement level (relative 
to the global grid) of the box is needed to define the operator. D-region operators are 
computed and stored out-of-core. In order to apply such an operator it is necessary to 
know the indices of unknowns located at all eight corners of the element. Because of 
local grid refinement, these corner unknowns are not stored in a contiguous manner, 
thus making the resulting operations indexed. 

The residual calculations consist of two phases: gathering information for each 
region and scattering information from the regions to the corner nodes. In the first 
phase, velocity, density and switching function values are computed at the centroids of 
every region. At the *th region centroid, the density p, is obtained from the magnitude 
of velocity, the fcth component of which is computed via 

(2.61) 
i= i 

where $ is potential, Vf- is the operator coefficient giving the contribution to the Arth 
velocity component at the centroid of the ith region from the jth corner unknown of 
the region, and IRU (i, j) is the index of the j th corner unknown of the *th region. The 
coefficients for D -regions are stored out-of-core. While $ is held in core, velocity 
components are computed for a block of regions at a time. By ordering elements 
not cut by boundaries (standard regions) in separate blocks, each block consisting 
of regions at the same level of grid refinement, the FORTRAN loop implementing 
Eq. (2.61) involves single-indexed arrays and executes at a rate approaching 100 
megaflops on a single- processor Cray X-MP (having 9.5 nanosecond cycle time). 

When some or all of the regions have supersonic flow, an extra step in the first 
phase of the residual calculation is necessary for each block to incorporate upwinding 
effects. The upwinded density, /5, in region number it is given by 
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(2.62) 


6 4 

Pit = Pit "t" Pit ) SFij(Vit) 2 C it,if,ir{pIBB(it,if,ir ) P « t ) » 

i/=l «r=l 

where {C«t, »/,«>} are the operator coefficients and {IBB(it, if, ir)} are the indices of 
regions adjacent (viz. index ir) to region number it across element face number if. 
The symbol p in Eq. (2.62) denotes the switching function and SF is a blending 
function (see Eq. (2.43)). In these equations the doublet parameters have been omit- 
ted for simplicity. The calculation of region centroid values of p is done by blocks 
of regions, as was described above for values of p, with $ stored in core. The FOR- 
TRAN loop implementing Eq. (2.62) is over all supersonic regions in a block (i.e., 
over block regions on which p is greater than zero). In the loop, the region number 
it of Eq. (2.62) is obtained via it = IND(I), where I is the loop counter ranging 
from one to the number of supersonic regions in the block. This means the array 
IBB = IBB(IND(I), •, •) is an indexed array, and, consequently, the FORTRAN ar- 
ray representing piBB{it,i/,ir ) of Eq. (2.62) is double- indexed. The peak execution rate 
for this FORTRAN loop on a single- processor Cray X-MP (having 9.5 nanosecond 
cycle time) is about 50 megaflops. However, the work done in this loop accounts for 
an insignificant portion of the total work done in an application. 

In the second phase of the residual calculation, the region-centered quantities ob- 
tained in the first phase are used to compute the residuals associated with the nodal 
solution unknowns. This phase is the dominant cost in the residual calculation and so 
efficiency is very important. The residual Ri for the /th solution unknown is obtained 
via 


R,= £ p i £lh J ,*(/RU(t,>)) + E {Pi-Pi)'ED ijt *(IRU(iJ)) (2.63) 

region i ;=1 region i j=l 

where Diji is the finite element divergence operator coefficient contribution to the /th 
unknown for the ith region from the unknown at corner j. The FORTRAN imple- 
mentation of Eq. (2.63) uses two nested loops, reversing the order of the summations 
so that the outer loop is over the eight corner unknowns and the inner (vectorizable) 
loop is over the elements of the block. Vectorization is possible because for any outer 
loop index j , the j th comers of all the regions in the block are distinct, and so no 
vector dependency occurs. This would not be true for an arbitrary block of D-regions, 
but in TRAN AIR, the comers are made distinct within a block by separating possible 
duplicates into different blocks. 

Careful design of these algorithms was necessary to minimize storage, allow effec- 
tive use of the Cray SSD, and achieve reasonable CPU speed. 

2.7.4 Data Structures 

TRANAIR uses a number of different data structures to facilitate compact and effi- 
cient usage of data. A prime example is the oct-tree data structure discussed at length 
in Appendix A. Among the other data structures used are the region-unknown lists, 
box-neighbor lists, operators, etc. 
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2.7.5 Program Libraries 

Wherever possible “black box” (modular) coding has been used to increase flexibility. 
An example of “black box” coding is the nonlinear GMRES routine which sees the 
entire residual evaluation process as an arbitrary function to be calculated. The 
code is built up from a set of libraries containing groups of subroutines which can be 
classified together. These consist of libraries for: 

• input processor 

• solver 

• output processor 

• special purpose mathematical routines 

• sparse solver 

• Green’s function 

• general purpose utility routines 

• general purpose mathematical routines 

• abutment processor 

• fluid dynamics routines 
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Chapter 3 
RESULTS 


Many results of applying TRANAIR are presented in this chapter. These results 
demonstrate the ability of the code to handle general geometry configurations in 
subsonic and supersonic free stream, the reliability with which these solutions can be 
obtained, and the flexibility of the code in allowing modeling of various flow features 
such as leading edge expansions, weak normal shocks, oblique shocks, and regions 
with different total temperatures and pressures. The example results are divided into 
three groups according to flow type. 

The first consists of cases in subsonic free stream governed by the linear Prandtl- 
Glauert equation. These cases are presented primarily for the purpose of comparing 
results with those of panel methods and analytic solutions where available. Neither 
Newton method damping nor grid sequencing is required in these cases. In each 
TRANAIR run, a single grid was constructed based on user specifications. 

The second group consists of cases in subsonic free stream where it is necessary 
to solve the full potential equation because the flow characteristics are nonlinear and 
possibly transonic. Results are presented for TRANAIR runs that employed single 
grids, grid sequencing, and solution adaptive grids. 

The third group consists of cases in supersonic free stream where again the solution 
of the full potential equation is necessary. The flow is predominantly hyperbolic in 
character. Subsonic regions and transonic flow characteristics are present in some 
cases. Solution adaptive grids were employed in all the runs in this group. 

The results presented in this chapter have been obtained over a period of two 
years. Where possible the results obtained from the most recent versions of the code 
are presented. In every case the result can be repeated to the same or improved 
accuracy with the most recent version of the code. 

In all cases presented in this paper, the solution (primarily represented by static 
pressure) is displayed at panel corner points. The static pressure is generally repre- 
sented by its non-dimensional counterpart defined as 


n — E. P°° 

W _ 1 0 

2poo<lio 


(3.1) 


where p is local pressure, /?<», /><», and q ^ are the free stream density, local pressure, 
and velocity magnitude. 
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The results presented here are obtained on the Cray X-MP machine with up to 
4MW of central memory and up to 128 MW of SSD storage or the Cray Y-MP. 

3.1 RESULTS FOR LINEAR FLOW 

In this section, linear flow solutions are discussed. Results for a sphere, the ONERA 
M6 wing, and the F16 fighter aircraft configuration are presented. 

3.1.1 Sphere 

For the sphere in incompressible flow, an analytic solution is available. This is a 
nontrivial problem for a Cartesian grid method since the surface intersects the grid in 
many different ways. A sphere with radius 0.8 was analyzed at M 0 c = 0 and a = 0.0°. 
Four grids were used to test the accuracy of TRANAIR. In Figure 3.1 the paneling 
used to describe the sphere surface in the coarse and medium grid cases is shown 
(there are 1600 panels describing the geometry of the half sphere using one plane 
of symmetry). With the fine and uniform grids, the paneling was doubled in each 
direction. Planar cuts through the four grids are shown in Figure 3.2. The uniform 
grid had 123,680 elements. It is noted that the outer boundary of the computational 
domain in the uniform grid case is very close to the boundary. The coarse, medium, 
and fine grids shown have 10356, 35456, and 149,515 elements respectively. Stagnation 
regions (those totally inside the sphere) are not included in the element totals. These 
cases were run with one plane of symmetry. Only half of each cut is shown since each 
cut is symmetric about a second plane of symmetry. 

In Figure 3.3 the surface pressure 1 for the sphere is plotted as a function of x. Also 
shown is the corresponding analytic solution. Data at all circumferential stations 
are plotted. For the 1600 panel case, there are 20 stations at each x value. The 
scatter of surface pressure at a constant x coordinate is due to the use of Cartesian 
grid and provides a good measure of the overall accuracy. The solution accuracy 
improves significantly as the grid is refined. The expected quadratic convergence rate 
in potential as the grid is refined has been verified earlier [36] in this case. 


*In all the subsequent discussion the term surface pressure is used to indicate the pressure 
coefficient 
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Figure 3.1: Paneling Used for Sphere in Linear Flow, 1600 Panels. 
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Medium Grid Fine Grid 

Figure 3.2: Cuts Through Four Grids for a Sphere in Linear Flow, M = 0. 
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Figure 3.3: Solutions on Four Grids for a Sphere in Linear Flow, A/< 







3.1.2 ONERA M6 Wing 

An ONERA M6 wing is analyzed at M 0 0 = 0 and angle of attack a = 3.06°. The 
boundary is described by 1800 panels (see Figure 3.6). The panels have a very high 
aspect ratio, being much longer in the spanwise direction than in the chordwise di- 
rection. This paneling is adequate for a solution in linear flow because the solution 
also changes more rapidly in the chordwise direction than in the spanwise direction. 

TRANAIR was run on a coarse grid having 35,188 elements and a fine grid having 
249,305 elements. Vertical cuts through the two grids at the plane of symmetry are 
shown in Figure 3.4. Figure 3.5 shows a waterline cut through the coarser grid. The 
clustering of fine grid cells at the leading and trailing edges is necessary to resolve 
high velocity gradients. 

Figure 3.6 compares surface pressure at the 20% span station with a solution 
obtained with a panel method. Note that the fine grid TRANAIR solution agrees 
well with the panel method solution. The leading edge is enlarged in the third plot. 
Figure 3.7 shows two other stations from these same solutions. 
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Fine Grid 


Figure 3.4: Cuts Through Two Grids for the ONERA M6 Wing in Linear Flow, 
M oo = 0,a = 3.06°. 
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Figure 3.5: Waterline Cut Through ONERA M6 Coarse Grid. 
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Leading Edge Closeup Paneling 



Panel Method © TRANAIR Fine Grid x TRANAIR Coarse Grid 


Figure 3.6: Three Solutions for the ONERA M6 Wing in Linear Flow at 20% span, 
Moo = 0, a = 3.06°. 
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80% Span Station 


Panel Method o TRANAIR Fine Grid x TRANAIR Coarse Grid 


Figure 3.7: Three Solutions for the ONERA M6 Wing in Linear Flow at 60% and 
80% Span, Mqo = 0, a = 3.06°. 
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3.1.3 F16 Fighter Aircraft 

An F16 fighter aircraft configuration shown in Figure 3.8 was analyzed at M ^ = 0.6 
and a = 4.0°. The configuration has 3510 panels. The TRANAIR run had 162,850 
elements. Figure 3.9 compares surface pressure on the wing at two stations with those 
obtained using a panel method. 



Figure 3.8: F16 Aircraft Configuration. 
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3.2 RESULTS FOR NONLINEAR FLOW 

To demonstrate the nonlinear capabilities of the method, solutions for a wide variety 
of three dimensional configurations are presented. These include complex fighter and 
transport configurations as well as the geometries used in Section 3.1 above for linear 
flow. 

3.2.1 Sphere 

A sphere with radius 0.8 is analyzed at = 0.7. At this condition, the flow is 
transonic and contains a strong shock. This case was used to test the effectiveness of 
the upwinding used in TRAN AIR. A fine grid was used to test the accuracy of the 
TRANAIR discretization. The grid contained about 170,000 elements and two planes 
of symmetry were used to reduce the size of the problem. A cut through this grid is 
shown in Figure 3.10. 

Figure 3.11 shows the convergence history for this case with grid sequencing and 
with viscosity damping described in the previous section. Five continuation steps 
were needed to achieve convergence with viscosity damping in this case. Significant 
step size damping was required for the first viscous problem. No step size damping 
was needed with grid sequencing. There is no significant difference in the aerodynamic 
solution obtained via viscosity damping or grid sequencing. 

Figure 3.12 shows surface Mach numbers as a function of x. Values at all circum- 
ferential stations are plotted. Because of the symmetry of the geometry and lack of 
angle of attack the solution should be axially symmetric. The TRANAIR solution is 
quite symmetric and also captures the well known re-expansion phenomenon at the 
foot of the shock. Post processing described in Section 2.5 was used. 
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NEWTON STEPS 



Grid Sequencing 
Viscosity Damping 


Figure 3.11: Convergence Histories for Viscosity Damping Method and Grid Sequenc 
ing Method for Sphere Case, M = 0.7. 
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Figure 3.12: Surface Mach Numbers for Sphere, M 0 0 = 0.7. 
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3.2.2 ONERA M6 Wing 

The ONERA M6 wing was analyzed at M '<*, = 0.84 and a = 3.06° using grid sequenc- 
ing. This a very popular test case for transonic flow codes and exhibits an oblique 
(supersonic to supersonic) shock as well as a normal (supersonic to subsonic) shock. 
Moreover, there is a fairly complicated shock pattern on the planform of the wing. 
Unless otherwise mentioned all the results in this sections were obtained using post 
processing. 

The TRANAIR results are compared to those obtained using the FL028 code of 
Jameson [34]. The surface geometry used in the two solutions is identical. FL028 
solves the full potential equation using a surface fitted grid and is particularly well 
suited to simple wing geometries such as the ONERA M6 wing. The TRANAIR 
solution is obtained using a grid with about 311,000 elements whereas the FL028 
code solution was obtained on a grid with 364,000 cells. Dense grids were used in 
both codes to accurately capture the oblique shock. In Figure 3.13 two vertical cuts 
through the TRANAIR grid for this case are shown. Figure 3.14 is a waterline cut 
through the grid. 

Figure 3.15 compares surface pressures at four stations with those obtained with 
FL028. The TRANAIR solution was obtained using flux biasing and post processing. 
It is unclear in this case whether the second order dissipation FL028 solution offers 
improved accuracy. In this problem, TRANAIR obtained comparable accuracy at 
comparable cost. 

The ONERA M6 wing case was also analyzed using the solution adaptive grid 
feature. The initial grid had about 15, 000 box elements. Intermediate and final target 
numbers of elements were specified as Nj = 300, 000 and Nf = 440, 000, respectively. 
A level limiting strategy was employed for the intermediate grids. The maximum 
level of refinement throughout the flow field was specified to be 4 levels below the 
global grid in all but the finest grid. In the finest grid 5 levels of refinement were 
permitted. One special region of interest was used to prevent refinement more than 
one level below the global grid in the tip region. No scaling of local error predictors 
was done. 

In the resulting adaptive grid run, 6 grids were created. These contained approxi- 
mately 39,000, 86,000, 184,000, 286,000, 312,000 and 423,000 elements. Figure 3.16 
shows 70% span station cuts through the initial grid and final adaptive grid. Cuts 
through the final adaptive grid at 0% and 44% span are shown in figure 3.17. 

Figure 3.18 displays computed pressure coefficients against percentage wing chord 
at 0% and 70% span for the initial grid, the second adaptive grid and the final 
adaptive grid. These results are generally quite accurate. One notices, however, that 
the oblique shock present on the wing at about 25% chord on the 70% span station 
is smeared. The oblique shock is a relatively weak phenomenon in this problem that 
can only be detected once a very fine grid is present. 
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Figure 3.13: Two Cuts Through Grid for ONERA M6 Wing, = 0.84, a = 3.06°. 
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Figure 3.14: Waterline Cut Through Grid for ONERA M6 Wing, M <*, = 0.84, a = 
3.06°. 
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o TRANAIR 


Figure 3.15: Comparison of Surface Pressure at Four Span Stations on ONERA M6 
Wing, Moo = 0.84, a = 3.06°. 






70% Span, Initial Grid 70% Span, Final Adaptive Grid 

Figure 3.16: Grid Cuts at 70% Span for the ONERA M6 Wing, M = .84, a = 3.06°. 



Figure 3.17: Grid Cuts at 0% and 44% Span for the ONERA M6 Wing, M <*, = 
.84,0 = 3.06°. 
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Figure 3.18: Pressure Coefficients for ONERA M6 Wing, = .84, a = 3.06°. 
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Figure 3.19: Wing Pressures for the F16, Moo = 0.9, a = 4.0°. 

3.2.3 F16 Fighter Aircraft 

The F16 shown in Figure 3.8 was analyzed at M 0 0 = 0.9 and a = 4.0°. The TRANAIR 
grid had about 189,000 elements. A comparison of computed surface pressure with 
wind tunnel data at two wing stations is shown in Figure 3.19. The agreement is good 
considering the fact that boundary layer effects are not yet included in TRANAIR. 

Another configuration of interest is the F16 with tanks and missiles shown in Figure 
1.2. This is a very difficult case for surface fitted grid codes. The finest TRANAIR 
grid in this grid sequencing run contained about 216,000 elements. Figure 3.20 shows 
three plane cuts through the computational grid. Figure 3.21 compares computed 
surface pressure just inboard and just outboard of the tank strut with TRANAIR 
results for the F16 without tanks and missiles. The effect of the tank is as expected. 

The F16 fighter with tanks and missiles was also analyzed using the solution adap- 
tive method. This case involves extremely complex geometry leading to some very 
severe flow regions which provides a tough case for solution adaptive refinement. In 
the run a starting grid with 8,224 boxes (a global grid 33 x 9 x 13 with a maximum 
of 3 levels of panel induced refinement) was used. 

It was assumed that the region of primary interest was the wing/ tank. Also the 
major features of the flow about the fuselage should be captured. Earlier test runs 
indicated several regions where developing high flow gradients necessitated restriction 
of refinement. In the wake region behind the missile, refinement was restricted to one 
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Waterline Cut Through wing 



Cut Through Underwing Tank and Strut 


Figure 3.20: Cuts Through the Grid for the F16 With Tanks and Missiles, = 
0.9, a = 4.0°. 
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level, also refinement was restricted to four levels on the fuselage and on the tank. 
A strong error indicator de-emphasis was necessary in the rear of the fuselage as fine 
grid in the cut-out region leads to very high flow gradients (and hence more grid 
refinement.) 

Three adaptive grids were used, generating grids with 25814, 82492 and 256667 
boxes. The final grid has a maximum of six levels of grid refinement, found at the 
wing leading edge. Fourth- and fifth-level grid is found in the shock regions, strut and 
tail leading edge, missile and the front of the fuselage. Fig. 3.22 shows cuts through 
the grid at y = 0 (plane of symmetry) , y = 72 (tank-strut location) and 2 = 94 
(waterline cut through the wing). 

Fig. 3.23 shows Cp at y = 72 (inboard side of the strut), and along the crown line 
of the front of the fuselage (the grid is shown in fig. 3.22). It is worth noting that 
when using adaptive refinement extra care must be taken with geometry definition. 
Figure. 3.24 shows the grid cut at y = 48 with an enlargement in the strake area. A 
small depression in the surface geometry is picked up by the error estimator and the 
grid is refined there to the maximum level. 
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Figure 3.22: Cuts Through the Final Adaptive Grid for the F16 With Tanks and 
Missiles, M ’<» = 0.9, a = 4.0°. 
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Figure 3.23: Computed Wing Pressures for Adaptive Grid Run of the F16 with Tanks 
and Missiles, Inboard Side of Strut and Crown Line, M 0 0 = 0.9, a = 4.0°. 


90 



Figure 3.24: Cuts Through Adaptive Grid for the F16 With Tanks a: 
the Strake, M <*, = 0.9, a = 4.0°. 
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3.2.4 Boeing 747-200 

TRANAIR was used to analyze the flow about a 747-200 transport at M <*, = 0.8 and 
a = 2.7°. For this case, M «> = 0.8 is approximately the largest free stream Mach 
number at which an inviscid solver can obtain accurate results. The configuration 
included wing, body, struts and nacelles. Over 20,000 panels were used to describe 
the surface geometry of the symmetric model (see Figure 3.25). 

The finest grid used in the grid sequencing run for this case consisted of approx- 
imately 219,000 finite elements. Figure 3.26 shows two cuts through the grid. The 
cut shown in Figure 3.26A is a yz plane cut and passes through the outboard nacelle 
strut and core cowl and through the prescribed wakes behind the inboard strut and 
nacelle. The cut shown in Figure 3.26B is an xz plane cut through the outboard 
nacelle. 

Figure 3.27 compares TRANAIR results with wind tunnel pressure data at four 
span stations of the wing. Overall, one sees very good agreement with experiment. 
Most of the differences are seen in the upper surface pressures and are attributable 
to viscous boundary layer effects not currently modeled in TRANAIR. By comparing 
lower surface pressure profiles, one can clearly see the effect of the outboard nacelle 
at the 69% span station, which is shown in Figure 3.26B. The very high speed local 
flow near the leading edge on the upper surface at this span station is due to the 
presence of a strut cap connected to the outboard strut and wing leading edge. 

The same case was also analyzed using the adaptive method. In applying the 
solution adaptive grid method an initial grid containing about 26, 000 elements was 
used. The specified intermediate and target numbers of box elements were 125,000 
and 250,000, respectively. Three special regions of interest were specified to guide 
the adaptive grid method. These included one about the wing tip, one under the 
wing enclosing the nacelles, and one above the wing. Four levels of refinement were 
permitted for all elements except in the special region of interest above the wing. 
There, 4 to 6 levels were permitted from the body to the wing tip in all grids except 
the final grid, for which 5 to 7 levels of refinement were permitted. Since wind tunnel 
test data for comparison were only available on the wing, the importance of this 
region was (further) emphasized by specifying a scaling factor of 8 for the local error 
predictors in the special region of interest containing the wing. A scaling factor of 2 
was used in the special region of interest containing the nacelles. Predictors in other 
regions were not scaled. 

Four grids were created in a run with the solution adaptive grid method. These 
contained approximately 58,000, 123,000, 133,000 and 243, 000 elements. Figure 3.28 
shows 69% and 96% wing span station cuts through the initial grid and final adaptive 
grid. Figure 3.29 compares computed wing pressures with wind tunnel test data at 
four wing span stations. The method yielded results that compare favorably with the 
wind tunnel experimental data. In this case, the solution was actually computed at 
69% span, resulting in an apparent difference from the results in Fig. 3.27 in which 
interpolation to 69% was performed. 
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Top View of Wing and Body 


Figure 3.25: 747-200 Transport Configuration. 
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A. Constant x Cut Behind Inboard Nacelle 



B. Constant y Cut Through Outboard Nacelle 
Figure 3.26: Two Cuts Through TRAN AIR Grid for 747-200 Case. 
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69% Span, Initial Grid 



69% Span, Final Adaptive Grid 



96% Span, Initial Grid 96% Span, Final Adaptive Grid 

Figure 3.28: Grid Cuts at 69% and 96% Wing Span for 747-200, M = .80, a = 2.70°. 
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Figure 3.29: Wing Pressures for 747-200, M = .80, a = 2.70°. 
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Figure 3.30: Cut Through Grid for an Axisymmetric Powered Nacelle, M <*> =0.1. 

3.2.5 Axisymmetric Nacelle with Powered Plume 

The next case illustrates the capability of the TRANAIR code to model different 
total pressure in an axisymmetric nacelle for which static test data is available [71]. 
The total pressure ratio in the powered stream was 2.807 and that in the primary 
stream was 2.3425. This configuration is shown in Figure 3.30 and had about 5000 
panels. The wakes were paneled so that the powered streams maintain equal area 
downstream from the exits. Two planes of symmetry were used for this case. Results 
are shown for the final grid with about 85983 elements. Five grids were used in this 
grid sequencing run with 539, 880, 2828, 16030, and 85983 elements. Figure 3.31 
gives the convergence history for this case. The steps on the coarser grids are scaled 
by the number of elements in the grid. 

The static pressure on the core cowl is compared with experimental data and with 
the results of running a Navier-Stokes code PARC2D [71, 72] in Figure 3.32. In this 
case PARC2D predicted no total pressure loss in the fan stream. Thus, isentropic 
modeling can capture the major features of this flow. 

3.2.6 Analysis of an Installed Transport with Power Effects 

Finally, the analysis of a transport aircraft with installed powered nacelles is pre- 
sented. The plumes behind the nacelle are simulated as regions of different total 
pressure and temperature. In Figure 3.33a and 3.33b, the paneling for the config- 
uration and a typical section of the grid with about 230,000 boxes are shown. In 
Figure 3.33c and 3.33d the pressure computed at an underwing station and inboard 
strut station with and without power (flight idle (ram) and cruise conditions) are 
compared. The effect of power on the local flow is obvious. This case demonstrates 
the capability of the TRANAIR code to handle power effects, a capability usually 
associated with an Euler formulation. 
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Figure 3.31: Convergence History for Grid Sequencing Method for Axisymmetric 
Powered Nacelle Case, M 0 0 = 0.1. 
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Figure 3.33: TRANAIR Analysis of a Transport with Wing/Body/Nacelle/Strut and 
Power. 
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3.3 RESULTS FOR SUPERSONIC FREE STREAM 
FLOW 

The supersonic free stream capability has been added to TRANAIR only recently 
and has not been exercised as thoroughly as the subsonic free stream capability. The 
supersonic free stream capabilities of TRANAIR have been tested on a cone-sphere 
configuration (compared with an analytic solution), two delta wing configurations 
with supersonic and subsonic leading edges, respectively (compared against a linear 
panel code solution and against the SIMP full potential code[73]), and an F16 con- 
figuration (comparison with experimental data and with a solution obtained with the 
SIMP full potential code). All configurations were analyzed with the solution adap- 
tive gridding option of TRANAIR. In addition, to demonstrate the abilities of the 
solution adaptive gridding to capture a bow shock, a solution has been obtained on 
the reversed cone-sphere configuration — a sphere-cone. Some limited comparisons 
have been made with experimental data obtained from some standard textbooks. 

3.3.1 Cone-Sphere Configuration. 

Figure 3.34 compares the pressure distribution obtained with TRANAIR, with SIMP, 
with EMTAC[74], and with the analytic solution for flow over a cone of a 10 degree 
half angle. The flow conditions were a free stream Mach number M «> = 1.414 and 
a = 0 = 0°. 

The radius of the cone at the spherical cap was 1.76327 units. To avoid contamina- 
tion of the surface solution by reflections of shocks from the outer faces of the global 
grid, the global grid was extended to twelve units normal to the axis of the cone. 

The final computational grid generated by TRANAIR is illustrated in Figure 3.35. 
Solution adaptation was performed five times. A minimum of one level of refinement 
near the boundary and a maximum of three levels was specified for the candidate 
initial grid. 

Note that the inviscid solution (obtained by TRANAIR, SIMP and EMTAC) pre- 
dicts a continued expansion to vacuum on the downstream end of the cone-sphere. 

In a real flow, viscous effects would produce separation before the expansion reached 
vacuum. The fictitious gas Mach number was raised to 7.0 for this analysis. Even 
at this value, the Mach number of the flow expands beyond Mach 15 before shocking 
down to stagnation at the aft portion of the sphere. The fictitious gas Mach number 
capability of TRANAIR allows it to obtain a solution to this case. It is interesting 
to note that with the fictitious gas model, the TRANAIR solution matches the Euler 
code solution obtained by EMTAC, rather than the SIMP solution. Because of the 
velocity gradients in the flow field in the aft portion of the cone-sphere, all of 
the solution adapted gridding has been attracted to the downstream spherical cap. 

The tip “shock” went undetected by the solution adaptive gridding in the sense that 
no refinement was produced in the field to capture the tip discontinuity. However, 
the surface pressure distribution agrees with the analytic solution, probably because 
the tip shock is a weak phenomenon. 
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3.3.2 Delta Wing Configurations. 

Figure 3.36 compares the spanwise pressure distribution for supersonic free stream 
flow past a delta wing with subsonic leading edges with a linear panel solution. The 
pressure distribution is illustrated at the 90% chord station. The free stream condi- 
tions were M = 1.414 and or = /? = 0°. The thickness at full chord on the plane 
of symmetry was 2% of chord with a linear variation of thickness from the tip to full 
chord and in the direction normal to the plane of symmetry. The leading edge of the 
wing is swept to an angle of 60°. Solution adaptive gridding with four cycles of grid 
adaptation were performed. Figure 3.37 illustrates representative cuts through the 
computational grid. 

Figure 3.38 illustrates the pressure distribution on a supersonic leading edge delta 
wing. The sweep angle of the subsonic leading edge wing was changed to 30° for this 
case. The TRANAIR solution agrees with the linear panel code solution up to the 
Mach cone emanating from the tip of the delta wing. The linear solution yields a 
constant Cp in the spanwise distribution, while the TRANAIR solution overshoots, 
presumably due to nonlinear effects. It is also possible that additional grid density 
might remove the overshoot. Because the flow gradients are relatively mild, some 
“hands-on” specification of grid may be required for this case. Figure 3.39 illustrates 
several cuts through the computational grid for the final (fourth) solution adapted 
grid obtained by the calculation. 


3.3.3 F16 Configuration. 

Figures 3.40 through 3.45 compare surface pressure distributions on the F16 configu- 
ration as predicted by TRANAIR and by SIMP. Figures 3.40 through 3.42 illustrate 
top, bottom and side views of the pressure distributions at free stream conditions of 
Mqo = 1.414 and a = 4°. Figures 3.43 through 3.45 compare the solutions for free 
stream conditions of Moo = 2.0 and a = 2°. In general the correlation in the pressure 
distributions is quite close. The TRANAIR pressure predictions tend to be smoother 
and less jagged. At M 0 0 = 1.414 the SIMP code produced an abnormal feature at 
the outboard trailing edge of the wing. This abnormality propagated in an inward 
direction toward the tail. When this intersects the configuration at the horizontal 
tail, a number of unusual reflections occur, contaminating the SIMP solution. The 
origin of this feature is not presently known. 

Another significant difference at both Moo = 1.414 and Moo = 2.0 occurs on the 
lower surface in the region where the strake and body join. This is attributed to some 
differences in the gridding of the surface for SIMP. It was not possible to obtain an 
accurate representation of the diverter channel with SIMP and so the surface geometry 
was modified in this region. The result downstream of this modification was a more 
sudden change in slope in the region under the strake, resulting in a greater degree of 
stagnation in this area of the configuration. This is an artifact of the inability of SIMP 
to generate a grid which accurately describes the configuration geometry. TRANAIR 
accepts the paneled definition of the F16 geometry and hierarchically refines the grid 
where required by solution gradients to more accurately model the flow field. 
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Figure 3.46 compares the pressure distribution predicted by TRANAIR with wind 
tunnel measurements on an F16 configuration with tip missiles at = 1.2 and 
a = 4°. Note that this configuration, (because of the existence of multiply connected 
regions in streamwise cuts of the configuration) could not be run in the SIMP code. 
Figure 3.47 illustrates some representative cuts of the final computational grid for the 
TRANAIR solution. Three cycles of grid adaptation were performed. 
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Figure 3.34: Cp Distribution on Cone 
EMTAC and Analytic Solution. 
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Figure 3.35: Final Computational Grid for Cone-Sphere Configuration. 
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Figure 3.36: Cp Distribution on Subsonic Leading Edge Delta Wing at Mach 1.414. 
TRANAIR vs A 502 Solution. 
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Figure 3.37: Final Computational Grid for Subsonic Leading Edge Delta Wing Con- 
figuration. 
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Figure 3.38: Cp Distribution on Supersonic Leading Edge Delta Wing at Mach 1.414. 
TRANAIR vs A502 Solution. 
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Figure 3.39: Final Computational Grid for Supersonic Leading Edge Delta Wing 
Configuration. 
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TRANAIR SIMP Comparison 



Upper Surface 


Figure 3.40: Cp Distribution on Upper Surface of F16, = 1.414, TRANAIR vs 

SIMP. 
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TRANAIR SIMP Comparison 
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Figure 3.41: Cp Distribution on Lower Surface of F16, M ^ = 1 414 TRANAIR vs 
SIMP. 
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Figure 3.42: Cp Distribution on Side View of F16, = 1.414, TRANAIR vs SIMP. 
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TRANAIR-SIMP Comparison 



Upper Surface 


Figure 3.43: Cp Distribution on Upper Surface of F16, M Q 0 = 2.0, TRANAIR vs 
SIMP. 
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TRANAIR-SIMP Comparison 



Lower Surface 


Figure 3.44: Cp Distribution on Lower Surface of F16, M ^ = 2.0, TRAN AIR vs 
SIMP. 
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TRANAIR-SIMP Comparison 



Side View 


Figure 3.45: Cp Distribution on Side View of F16, A/qo = 2.0, TRANAIR vs SIMP. 
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Figure 3.47: Representative Cuts Through Computational Grid for F16 Configuration 
With Tip Missile. 


118 



3.3.4 Bow Shocks 

To test the ability of the solution adaptivity in TRANAIR to capture bow shocks, 
the cone sphere configuration was reversed and the case was re-run under the same 
free stream conditions. Figures 2.21 through 2.24 illustrate the computational grids 
generated by TRANAIR for this configuration. The bow shock is captured up to the 
point that the shock becomes oblique. Five cycles of grid adaptation were used to 
obtain the solution in Figure 3.48. The early refinements were primarily attracted to 
the subsonic region between the configuration surface and the bow shock. After the 
third cycle of solution adaptive refinement, the estimated errors in the subsonic region 
had been reduced enough so that the error indicators in the bow shock dominate 
the refinement process. After five cycles of solution adaptive gridding, the fringes 
of the refinement along the bow shock are the only significant areas indicated for 
further refinement. Figure 3.48 compares TRANAIR predictions for the location of 
the bow shock over a range of Mach numbers from 1.01 to 2.8 with experimental 
data published in some standard fluid flow textbooks [75], [76]. TRANAIR predicts 
bow shock locations somewhat downstream of the experimental curves, but there is 
some significant scatter in the experimental data as indicated by the three positions 
derived from shock position data for 0.25”, 0.5” and 1.0” spheres. 

To test whether the solution adaptivity could detect and capture a bow shock in a 
realistic configuration, a TRANAIR analysis was performed on the F16 configuration 
with underwing fuel tanks present. Five cycles of solution adaptivity were performed. 
Figure 3.49 illustrates the grid generated by TRANAIR for a chordwise cut through 
the wing and underwing tank. There is clearly a bubble of subsonic flow both in front 
of the underwing tank and in front of the strut supporting the tank. The solution 
adaptive gridding has clearly detected the bow shock and resolved it up to the sonic 
point, where the bow shock weakens and becomes oblique. This behavior is very 
similar to that observed for the sphere-cone. 
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Figure 3.49: Computational Grid for F16 Configuration With Tip Missiles and Wing 
Tanks, M a 0 = 1.2, a = 4°. 
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3.3.5 General Observations. 


For supersonic free stream flows, particularly in the case of supersonic leading edges, 
surface pressure peaks at leading edges tend to be suppressed, compared to subsonic 
free stream cases or to subsonic leading edges. In addition the surface pressure dis- 
tributions tend to be flatter and and have smoother gradients. On account of this 
it is possible to obtain solutions for supersonic free streams using somewhat smaller 
computational grids than for subsonic free streams. Many of the results obtained 
in this section used fewer than 100,000 boxes in the entire grid. However, in super- 
sonic free stream flows, aft-facing portions of the configuration can easily generate 
very large gradients and tend to attract a disproportionate degree of refined grid. In 
addition, whenever a bow shock is present, the velocity gradients become very large 
in the region between the bow shock and the configuration surface. This also tends 
to attract grid refinement in preference over the field discontinuity itself, particularly 
when the shocks become oblique. It is quite possible to capture bow shocks with 
the current solution adaptive gridding in TRANAIR, but a larger number of cycles 
of solution adaptivity or some extra guidance by the user is advisable to make sure 
that the grid refinement goes into regions of more importance to the application. As 
an extreme, but very realistic example: left to its own preferences, TRANAIR would 
refine the cutout regions of the F16, particularly the cutouts near the horizontal tail, 
in preference to any other portion of the configuration. In the present state of the 
code, it is recommended that, for supersonic free stream flows, some initial analyses 
be performed on fairly coarse grids (up to 100,000 boxes, and three to four cycles 
of solution adaptivity). After these results are obtained, the solution and grid can 
be examined (for example, by using the TGRAF program, described in Appendix B 
of the User’s manual), and appropriate user directives concerning regions for extra 
emphasis and de-emphasis can be defined. 
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Chapter 4 

FUTURE DIRECTIONS 


The ultimate goal of this work is to offer designers a reliable, general purpose, full con- 
figuration flow analysis tool. For this purpose we have preferred to start with a reason- 
ably simple nonlinear physical model (the full potential equation) where we are certain 
that such an objective can be met. At this time we have implemented an approach 
to solving the full potential equation into a computer code called TRAN AIR. Results 
obtained from this code on a variety of configurations have shown that TRANAIR 
is capable of achieving this ultimate goal. In fact TRANAIR is currently being used 
successfully by many engineers[49], [50], [82] to analyze complex configurations. 

However, TRANAIR is by no means a finished product. First, a variety of im- 
provements must be made before TRANAIR can really be thought of as a reliable, 
general user tool. Second, should these improvements be made, the next order of 
business would be to enlarge the scope of problems addressed by TRANAIR. Cur- 
rently TRANAIR allows regions of differing total temperature and pressure, which 
is important for simulating propulsion effects. An additional capability which would 
be desirable is the capturing of vortex, or wake sheets. In theory this can be done 
within the framework of a full potential approach. (However, it can be argued that 
a method which allows differing total pressures and temperatures as well as wake 
capturing is three-fourths of the way to an Euler method.) Another highly desir- 
able capability would be the simulation of boundary layer effects. Such effects are 
extremely important in the transonic flow regime. A final capability, which could 
greatly aid the design process is an optimization algorithm, allowing the user to spec- 
ify certain desirable flow features which TRANAIR would then try to achieve with 
geometry adjustments. 

In this chapter, we discuss some ideas on future efforts to improve TRANAIR. 
In Section 4.1 we discuss efforts required to improve the reliability, accuracy and 
efficiency of the current code. In the remaining sections we discuss improvements 
in capability. In Section 4.2 we explore some of the issues involved in extending 
TRANAIR to solve the full Euler equations, which would automatically yield a wake 
capturing capability. An alternate, and preferred approach, still within the framework 
of the full potential equation is described in Section 4.3. In Section 4.4 we briefly 
discuss the addition of a boundary layer capability to TRANAIR. In Section 4.5 we 
discuss implementation of design and optimization capability. 
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4.1 IMPROVEMENTS TO THE METHOD 

4.1.1 Reliability and Efficiency Improvements 

In the current implementation of the GMRES algorithm the iterative procedure con- 
tinues until the preconditioned residuals are reduced by a fixed fraction (assuming 
the maximum allowable number of iterations is sufficiently high). Because of the use 
of a drop tolerance in the sparse solver the condition number of the preconditioned 
linear system solved by GMRES may vary considerably. Such a condition number 
should be taken into account when assigning the above fraction. There are various 
ways for estimating the condition number. For example, comparing the reduction in 
real residuals with the reduction in preconditioned residuals can give a lower bound. 

On occasion Newton’s method has trouble converging on coarse grids, but does 
quite well on fine grids. This usually happens when the configuration involves small 
diameter plumes of differing total pressure and temperature. We suspect that on 
coarse grids the plume geometry is not well resolved, leading to a nearly singular 
boundary value problem. It would be fairly simple to ensure that the coarsest grid 
always resolves small scale geometrical features. Moreover, the coarse grid should be 
the minimal grid which does this, as denser grids may fix the shocks in the wrong 
locations. 

In the case of supersonic free stream flow a local free jet boundary condition 
(i.e. zero perturbation potential) is currently imposed on the top, bottom and side 
faces of the computational box. Such a condition is certainly better than a solid 
wall condition, but both cause wave reflections back into the computational box. 
No local boundary condition is entirely accurate, but an outgoing wave condition is 
undoubtedly superior to a free jet condition and should be implemented. 

Currently the drop tolerance used by the sparse solver is assigned by the user 
based on the knowledge that a drop tolerance somewhere between .001 and .0001 
has generally worked in the past. However, even in this range fill-in (and hence 
SSD usage) can vary considerably. This is not an issue that the user should have to 
worry about. The user should only specify the size of the SSD storage available, and 
the code should then adaptively determine a drop tolerance which would lead to a 
decomposition of roughly this size. 

Currently the off diagonal terms in the sparse matrix decomposition are dropped 
when their magnitude is smaller than the drop tolerance times the magnitude of 
the corresponding column (or row) diagonal. This works reasonably well, but many 
improvements which would substantially reduce decomposition costs and storage are 
possible. For example, the terms which are dropped could be added to the diagonal 
or else the diagonal could be augmented by a fixed factor to improve stability when 
poor conditioning is suspected. 

Computation of the finite element and upwinding operators comprises from one- 
third to one-half of the run cost of the current code. A good share of this cost could be 
eliminated if operators associated with T-boxes which do not get refined or derefined 
could be saved from one grid to the next. 

One of the most time consuming aspects of a TRANAIR analysis is the construc- 
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tion of networks of panels from available configuration lofts. In the future it would 
probably be best for TRAN AIR to get its surface definition from the lofts directly. 
Given a loft for a surface patch TRANAIR could interrogate the loft in an adaptive 
manner to build an unstructured surface triangularization whose density is deter- 
mined by the estimated local curvature. Because the cost of solution depends very 
weakly on surface discretization it would be possible to make the initial discretization 
sufficiently dense to accommodate the finest grid. However, the unstructured nature 
of the surface discretization would require the user to supply additional information 
concerning the nature of output of surface flow quantities. 

The sparse solver is used as a preconditioner for unknowns in the reduced set. 
Currently the reduced set contains all unknowns except for those located at subsonic 
global grid nodes. These unknowns are preconditioned by the much cheaper Poisson 
solver. Using standard elliptic multi-grid methods it should be possible to extend the 
Poisson solver to handle refined grids. This would allow the elimination of subsonic 
refined grid unknowns from the reduced set as long as they were not located at the 
boundary. In many instances the size of the reduced set would decrease by as much as 
50% leading to a substantial reduction in the costs associated with the sparse solver. 

The process of interpolating from a coarse grid to a fine grid has already been 
developed to facilitate grid sequencing. Such a process could also be the basis for 
implementing a multigrid solution procedure. We recommend a limited implementa- 
tion wherein the sparse solver preconditioner calculated on the next to finest grid is 
also used on the finest grid. This can be done by collecting residuals onto the coarser 
grid, using the sparse solver preconditioner as a smoother on this grid, distributing 
the corrections to the fine grid and locally smoothing the fine grid residuals. Since 
the fine grid is generally at least twice as large as the next coarsest grid the CPU and 
storage savings of such a procedure could be very significant. 


4.1.2 Upwinding Improvements 

Currently, the “entropy condition” for ensuring compression shocks is achieved through 
a first order density or mass flux retardation procedure. This procedure occasionally 
causes the reliability problems and affects efficiency and accuracy of the code. The 
first problem is that the upwinding is only first order whereas the remainder of the 
method is second order. This forces the grid to be finer in supersonic zones than 
in subsonic zones, although in supersonic regions the adaptive feature of the code 
will limit grid refinement to regions with high flow gradients, somewhat alleviating 
the problem. One course of action would be to implement a second order accurate 
upwinding algorithm. Historically such algorithms have not been very robust, espe- 
cially for complex geometries. A better course of action might be to develop a first 
order algorithm with a smaller error coefficient. In fact, flux biasing (or retardation) 
seems to be much superior to density retardation in this regard and there appear to 
be possibilities of improving it by taking variations of stream tube areas into account. 
Unfortunately, flux biasing can be shown to be singular when the flow becomes one 
dimensional and the Mach number oscillates about Mach 1. This makes flux biasing 
substantially less robust than density biasing in practice and it will first be necessary 


125 



to modify the biasing formula in a manner which simulates the use of a subsonic 
cutoff Mach number in density retardation formulas. 

The potential flow assumption produces errors in normal shock strength when the 
Mach number ahead of the shock is greater than approximately 1.4. Hafez[77] has 
developed a correction to account for entropy production, which apparently allows 
more realistic flow simulations at higher Mach numbers. This correction should be 
implemented and tested in TRANAIR. 

During the Newton iteration process, velocities emanating from boundary surfaces 
may appear. If these velocities correspond to supersonic Mach numbers then upwind- 
ing cannot be performed properly and the problem temporarily becomes singular. 
This leads to a breakdown in the solution procedure. In the case of an engine exit, 
where the velocity is supposed to emanate from the exit surface, we upwind densities 
next to the exit to a fictitious density which corresponds to free stream Mach num- 
ber. This works quite well if the exit Mach number should be subsonic and the free 
stream Mach number is also subsonic. The linearized problem is then non-singular 
and within several Newton steps the exit Mach number recovers to a subsonic value. 
If the exit Mach number is supposed to be supersonic, then the user must specify this 
Mach number (although not all values are feasible). We have not yet developed the 
input formats to allow the user to do this, but we have demonstrated that by retard- 
ing the density to a fictitious density corresponding to this specified Mach number, 
the exit velocity eventually settles to the proper value. Therefore the input formats 
should be developed. 

We have not yet developed a strategy when the surface is a solid surface or a 
wake rather than an exit. Density is continuous across wakes so the problem may 
be handled by upwinding to densities on the opposite side. In the case of solid walls 
where the local flow should be supersonic it would be difficult to employ a fictitious 
density, since the true local Mach number is only known upon solution. It might 
be better to stop the flow direction anomaly from arising altogether by damping the 
Newton method based on velocity direction changes. 

4.1.3 Solution Adaptive Grid Improvements 

The ultimate goal of solution adaptive grid refinement is to produce an accurate 
flow simulation at a low computational cost with minimal user intervention. In its 
present form TRANAIR makes significant progress towards this goal. In working 
with the solution adaptive process a number of ideas have emerged which will carry 
TRANAIR considerably further towards this goal. Because these areas have not 
been thoroughly investigated to date, the ideas discussed below have not yet been 
implemented in the code. It is expected that after further analysis and careful testing, 
their implementation will produce significant improvements in the effectiveness of 
solution adaptive gridding (with minimal user intervention). 

In the first place, TRANAIR would provide more effective solution adaptive grids 
if it did a better job of exploiting the physical aspects of the flow. For example, if 
in some region of the flow the local Mach number exceeds the fictitious gas Mach 
number, the full potential equation is a poor approximation to the flow. Although 
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the estimated errors in the neighborhood of such regions may be quite high, creating 
higher grid resolution in such areas provides little benefit in terms of obtaining a 
more meaningful engineering answer to the flow problem. Thus refinement should 
be limited to regions where the flow remains physically sensible. In addition, where 
real discontinuities in the flow field occur (i.e., at a shock), no matter how much 
grid refinement is applied in the vicinity of the shock, jumps in the velocities will 
still remain, and the presently implemented error indicators will remain large. Thus 
some form of automatic local “shock limiting” is desirable to prevent grid refinement 
beyond what makes sense for a given engineering application. At present there is such 
a limiter but it is determined from user input. Shocks (particularly normal shocks) 
can easily be recognized by the code, and thus it would be fairly straightforward to 
automatically introduce limits to grid refinement wherever shocks occur, without any 
user specification. 

Another easily recognized physical phenomenon related to shocks is whether the 
flow in a region is undergoing expansion or compression. With the current imple- 
mentation of solution adaptivity, grid refinement tends to be attracted to expansion 
regions (like the leading edges of a wing). Ultimately, this is a desirable phenomenon, 
but when excessive grid is attracted to an expansion region, it can happen that latent 
features of the flow field (for example, oblique shocks) do not attract adequate grid 
density. The result can be a smooth pressure distribution which hides the existence of 
the phenomenon. (The ONERA M6 wing has proven to be a very good case for ana- 
lyzing this difficulty). Thus it appears to be desirable to suppress grid refinements in 
expansion regions in the earlier steps of solution adaptive gridding. This encourages 
grid refinement in regions where these latent features might occur. Only in the final 
stages of solution adaptive gridding should the grid be permitted to cluster in expan- 
sion regions. The present code allows for suppression of grid refinement in expansion 
regions, but such regions must be identified a priori by a user. 

Some exploration has been made of alternative error indicators. The present im- 
plementation bases these on discontinuities in velocity magnitudes from cell to cell. 
This indicator provides high error estimates when the velocity magnitude jumps, but 
such error estimates are relatively small when a jump is due to a turning of the veloc- 
ity (as occurs for oblique shocks). It is possible that an error indicator based on the 
change in the direction of the velocity would provide earlier emphasis of more latent 
flow features, such as oblique shocks. 

In the early applications of solution adaptivity, the same type of error estimates 
and the same grid refinement strategy have been applied for subsonic, transonic and 
supersonic free stream flows. It is quite likely that because of the physical differences 
in these problems, different solution adaptive strategies may be beneficial. 

Ultimately an overall strategy for solution adaptive refinement will emerge from 
these ideas which will consist of a number of error indicators along with an appropriate 
weighting as the solution adaptive gridding continues in order to: 

• capture the latent features of the flow early, when the computational costs are 
low; 

• recognize regions where true discontinuities occur and limit refinement to length 
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scales that make good engineering sense; 

• provide sufficient grid resolution in expansion and stagnation areas to produce 
accurate solutions on the final grid; 

• automatically recognize regions of non-physical flow features and avoid wasting 
grid resolution there; 

• automatically (or with very little user intervention) detect regions where there is 
little interest in resolving high flow gradients such as wing tips, but at the same 
time, allowing the user to study these areas if that is the design goal; 

• provide a near optimal strategy for solution adaptive gridding for subsonic, tran- 
sonic, and supersonic flows using only knowledge of the free stream Mach num- 
ber. 

The elements of this final strategy have been identified and some issues have been 
explored. Significant improvements can be made in the adaptive gridding, resulting 
in more efficient and accurate solutions. 

4.1.4 Higher Order Elements 

Higher order finite elements are currently being developed for structures applications 
[78], [79] with remarkable success. Incompressible Navier-Stokes calculations are also 
being attacked with these methods [80], [81]. For smooth problems, these methods 
offer exponential order convergence (better than any algebraic order of convergence) 
in the number of unknowns. For problems with singularities, they offer substantially 
better algebraic rates of convergence. These methods have the same rate of con- 
vergence as spectral methods, but are not subject to the same limitations such as 
rectangular domains and separable grids. 

Thus, there is the potential for large savings in CPU time and storage in TRANAIR 
with the proper use of higher order finite elements. In particular, in solving the full 
potential equation the important flow quantities are defined in terms of the velocity, 
which is only first order accurate in the mesh spacing locally. Since the velocity is 
first order, to achieve a factor of two reduction in error currently requires eight times 
as many elements. With second order velocity only three times as many elements are 
required. Thus, the payoff of higher order methods is great. 

Higher order basis functions that have successfully been used include Tchebychev 
polynomials, Legendre polynomials, and other higher order polynomials. Continuous 
basis functions seem to be much more flexible than ones with more degrees of inter 
element continuity. In TRANAIR, one would implement a triquadratic element basis 
function for the potential. This would require a trilinear approximation to the den- 
sity. The density would be an interpolating polynomial fitting four points in every 
element. Integrals would probably best be evaluated with numerical quadrature rules. 
A sophisticated adaptive strategy would be needed to determine where to use these 
higher order elements and where to use the currently implemented trilinear elements. 
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An unresolved issue is how to do higher order upwinding in supersonic regions. We 
can observe however, that a second order upwinding of density could be achieved 
without any more element connectivity information than is currently in TRAN AIR. 

There have recently been significant advances in iterative methods for the mod- 
erately sparse linear systems resulting from these discretizations. The solution tech- 
nique can use a recent result of Babuska that the lowest order stiffness matrix is an 
excellent preconditioner for the higher order finite element problem. Thus, the cur- 
rent Jacobian matrix calculation and decomposition could be used as the direct solver 
preconditioner. If successful, this would save significant coding and computational 
expenses. 

Thus, the broad outlines of a higher order method for TRANAIR have been 
thought out. Such a method could provide much greater accuracy at reasonable 
cost than current methods which are all second order in the potential. This could 
enable the accurate calculation of such sensitive measures of performance as inviscid 
drag, which current methods can not predict. 


4.2 EULER FORMULATION 

4.2.1 Properties of Euler Equations 

The steady state Euler equations express conservation of mass, momentum and 
energy as follows: 


Conserved 

Quantity 

D. E. 

FLUX 


(4.1) 

mass 

VW = 0 

W = 

pu 

momentum 

V -m = 0 

m = 

WU T + pi 

(4.2) 

energy 

V • E = 0 

E = 

H W 

(4.3) 


In the first column we display the quantity conserved across discontinuity surfaces, 
i.e. the normal component of flux, in the second, we display the differential equation 
for each conservation law, and in the third, we display the relevant flux. To complete 
the description we define total enthalpy, H , and entropy, S. 


enthalpy definition H = 

P 


7P 


1 


(T-l )p + 2" ■*- 


V 


entropy definition 


= e (y-') s (JL.y 


(4.4) 

(4.5) 


_P_ 

Poo 'Poo 

In order to eliminate the possibility of expansion shocks we need an entropy condition 
which we choose to introduce via artificial pressure, i.e. we redefine m as 

m=WU T + pi 7 • V- p (4.6) 
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Here V_ p is a backward derivative and 7 a vector in the direction of JJ having 
magnitude on the order of the grid size. The e is non-zero only in supercritical 
regions. For the discussion which follows it suffices to ignore artificial pressure. 

To better understand the Euler equations we can recombine Eqns. (4.1)-(4.3) in 
the following ways 


(energy) - H(mass) = W • V H = 0 (4.7) 

(q 2 — H)mass— U -H( momentum ) + ( energy ) = — W • V S = 0 (4.8) 

(momentum)- U (mass) = — W®u—p\7S + pVH = 0 ( 4 . 9 ) 

Here a>=V ® U is the vorticity vector. Equation (4.9) can be rewritten in a better 
way by introducing the concept of swirl, i.e. 


Then Eqn. (4.9) becomes 


G = 


W • u 

p 2 q 2 


= swirl 


(4.10) 


W= G W +-^r w <8> V S- W® V H (4.11) 

p 2 q 2 pq 2 

Using Eqn. (4.1) and the fact that V • u>= 0 we obtain an equation for G by taking 
the divergence of Eqn. (4.11), i.e. 


W • V G = - 


V • 


-f? w® VS- 

p 2 q 2 


-^w ® v h 
pq 2 


(4.12) 


Now let us assume we have an initial estimate of W, and let £/, S and H be the 
fundamental unknowns. Equation (4.7) is a convection equation for H and states 
that H is constant along streamlines. If H is specified at the head of every streamline 
then H may be found at every point in the flow field. In particular if H is the same 
constant at the head of every streamline then H will be identically constant in the 
flow field and the flow will be isoenergetic. Mechanisms which produce non-constant 
H include propellers and jet engines. The appropriate value of H must be specified 
at the head of each streamline leaving these mechanisms. From Eqn. (4.8) we see 
that S also satisfies a convection equation. Entropy must also be specified at the 
exit of propulsion devices. However, entropy also has field sources in the case that 
dissipation is present, e.g., when Eqn. (4.6) is operable. Then Eqn. (4.8) will have a 
non-zero right hand side. These “convection” sources should be negligible except at 
a shock. Once H and S are known, Eqn. (4.12) becomes a convection equation for G 
with a specified right hand side field source. This equation can be integrated to give G 
everywhere in the flow field once G has been specified at the head of every streamline. 
If S and H are constant in the flow field, then the only source of swirl is via boundary 
conditions at the head of streamlines. Again, propulsion devices produce swirl, but 
another major source is the Kutta condition at trailing edges. From Eqn. (4.11) we 
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see that in the absence of variations in S and H vorticity can only be produced by 
swirl. In fact, free vortex sheets in potential flow are produced entirely by swirl. 

Once 5, H and G are found everywhere in the flow field, u> may be determined 
from Eqn. (4.11). If we decompose U into a scalar and vector potential 

U=V $ + V®A (4.13) 

then A may be determined by taking the curl of both sides, i.e. 


V A=w (4.14) 


As in potential flow, $ may now be determined from the mass conservation Eqn. (4.1) 
and the specified boundary condition on U ■ Unfortunately, this is not the same $ as 
in potential flow. Even in portions of the field where u> is zero, A will not necessarily 
be zero, since Eqn. (4.14) spreads A to the whole flow field. 

An approach which casts the Euler equations as a more direct generalization of 
potential flow is based on the Bateman variational principle. Here we seek a stationary 
value of a payoff, J, defined by 

J = ~j J J y pdV , p* = p*(U , H, S) (4.15) 

here p* is an arbitrary function 
Let us define 

W= - 

If we choose p* to be pressure as the usual function of (/, H and 5, then Eqn. (4.16) 
is consistent with the usual definitions of W, p and p. (One could also choose p* to 
be the second-order expansion of p about p^ , in which case W becomes the usual 
linear mass flux vector used in panel methods. Rolled up vortex sheets are possible 
with such an approximation, but not shocks.) Taking a variation of J and neglecting 
higher order terms we obtain 

8J = - j J \w -6 U +p8S - p<5tf] dV (4.17) 

Let us now use a Clebsch decomposition of the velocity vector, i.e. 


of U ■, H and S. 

dp' 


dp' 

dU 


p = 


dS 


P = 


dp' 

dH 


(4.16) 


Then 


U=V$+Q , Q= p VA-5 Vp+iH-H^) VC 


8 f/=v <5$ + 8p 


V A 4 /i V 8 A 


- 8S V tj- S V 6tj 
+ 8H V ( + {H - Hm) V 8( 


(4.18) 


(4.19) 
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Hence 


SJ = J I J [ (W • V A)fy - (w • V ri)6S 
+ (W • V OSH + pSS - p6H]dV 
+ J J J v l (W • V 6<t>) + p(W • V S\) 

- S(W • V 6r)) + {H- H^w • V 


Integrating the second integral by parts, 


SJ = m [ (V • iv)6$ + (iv-v a + (v -H iy)<5A 

+ (p-W • V t/)< 55 - (p- VV • V 

- (v -5 iv-)^ + (V H W)8<;]dV 
+ J Js [ {W -h)6$ + (p W -h)8\ 

- {S W -h) 6r) + (H w -h)S(]dS 

Assuming that 6 J is stationary we obtain the following equations 


V • W = 0 
W ■ V A = 0 


V ifi W ) = 
p- W -V T) = 
p-w -vc = 

w • v s = 

V {H W) - 


W- V/i = 0 
0 
0 
0 
0 


(4.20) 


(4.21) 


(4.22) 

(4.23) 

(4.24) 

(4.25) 

(4.26) 

(4.27) 

(4.28) 


The first equation is the mass equation and the last is the energy equation. The 
others are then equivalent to the momentum equation as can be seen by substituting 
Eqn. (4.18) into Eqn. (4.9). 

Given an initial estimate of W Eqn. (4.27) and Eqn. (4.28) can be solved as 
convection equations for S and H . Then p and p may be evaluated from Eqn. (4.16). 
The convection Eqns. (4.23)-(4.26) can be solved for the adjoints A, p, 77 and £. 

This determines Q via Eqn. (4.18). The potential $ may then be determined from 
Eqn. (4.22), i.e. 

V .(p V $) = - V • <2 (4.29) 

If the flow is known to be isoenergetic, then H can be set equal to Hoo and we can 
delete Eqn. (4.28) and Eqn. (4.26) from the system. One can proceed similarly for 
isentropic flow. By choosing A and p to be zero at upstream infinity we guarantee 

that Q exists only where vorticity is present. Hence wherever potential flow exists, 
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$ alone defines the flow. Euler flow can be interpreted as potential flow with field 
sources Eqn. (4.29) which exist only in regions having vorticity and whose strengths 
are determined from convection Eqns. (4.23)-(4.28). 

4.2.2 Problems with Euler Equations 

Although we have cast the Euler equations as a generalization of the full potential 
equation, the introduction of convection equations creates a considerable number of 
numerical problems. In this section we discuss the problems we feel must be addressed 
and to some extent resolved before a considerable investment is made in a production 
full configuration Euler code. (The results shown below are obtained from a special 
test code written for the Euler equations) 

The first problem concerns false production of a convected quantity. As a rule, 
we want to solve the conservation equations (4.1)-(4.3) in conservative form, not 
simply to capture discontinuities correctly, but to calculate accurate total forces and 
moments for large, complex configurations where truncation errors are hard to control. 
However, if we solve the Euler equations in conservative form, convection equations 
such as (4.8) will effectively have non-zero field sources on the right hand side due 
to truncation errors. This means that entropy may increase or decrease along a 
streamline when it should remain constant. The error is not locally confined, since 
false entropy which is generated upstream will convect downstream. In many current 
codes entropy production is responsible for poor drags and boundary layer matching 
as well as premature separation. 

Even if Eqn. (4.8) were to be solved directly, numerical diffusion errors would still 

create problems. Convection operators such as W • V all have inherent diffusion due 
to truncation errors. For grids used for inviscid modeling this numerical diffusion 
is orders of magnitude greater than that produced by viscous terms of the Navier- 
Stokes equations, hence nonphysical results are possible. The numerical diffusion is 
greatest when crossflow gradients are largest, e.g. at slip surfaces. To illustrate the 
problem we consider channel flow over a rectangular bump. A numerical solution 
was obtained using a rectangular grid ad hoc test code. In Figure 4.1 we show 
the results of convecting a smooth distribution of entropy at the entrance using a 
fairly good upwind discretization of the W ■ V operator. The isentropic curves 
correspond closely to streamlines. This is true even for the bottom streamline which 
passes through regions of stagnation as well as large expansion. In Figure 4.2 we 
show results in the case of a discontinuous initial distribution of entropy. Here a 
considerable amount of diffusion takes place even on streamlines which lie in a region 
of relatively uniform flow. Clearly such diffusion must be eliminated if one wishes to 
calculate the effects of wing wakes on downstream components of the configuration. 
One can use non-diffusive numerical schemes which require every value of entropy to 
be precisely equal to some upstream value in the absence of legitimate dissipation. 
In Figure 4.3 we show the results of using such a scheme. Obviously there are no 
diffusive errors. However displacement errors are still possible when using such a 
scheme although they are rather small for this particular case. The major problem is 
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that a non-diffusive scheme is difficult to implement in a flux conservative formulation, 
and it is this problem which will require considerable effort to solve. There are a 
variety of possible approaches which introduce additional degrees of freedom so that 
the convection and flux conservation equations may be solved simultaneously. The 
Clebsch formulation is one such approach, and a displaced location for convected 
unknowns is another. These approaches will also solve the false entropy production 
problem as well. 

A third problem with the Euler equations concerns uniqueness. There are cer- 
tain situations where an Euler solution cannot exist without separation and closed 
streamlines[83]. Obviously the level of the convected quantities is indeterminate on 
closed streamlines. In fact the size of the separation region itself will depend on what 
value one’s program happens to assign to the convected quantities. Thus quantities 
such as drag and lift will turn out to be somewhat arbitrary. There is not much 
one can do about this problem except try to eliminate false entropy production so 
that one achieves an unseparated solution when it exists. If such a solution does not 
exist then probably one should strive for a solution which minimizes the extent of the 
separation region. 

A fourth problem with Euler equations concerns vortex separation (or swirl gener- 
ation). The Euler equations do not contain enough physics to predict the location of 
separation lines or strength of separation except in special cases such as sharp edges. 
Thus one must be able to effect separation on the basis of outside knowledge. The first 
task is to clean up false entropy and swirl production so that premature separation 
does not take place. Secondly one must be able to specify the separation line and type 
of separation directly. Pure vortex separation is achieved by specifying a source for 
swirl only. (Allowing entropy increases will lead to contaminated vortex separation.) 
The strength of the swirl sources must be determined by a Kutta condition. 

A final problem concerns vortex instabilities and related non-existence. Current 
literature[84] seems to indicate that the Euler equations have a legitimate solution 
only in special cases (e.g. potential flow). Vorticity seems to collect in unstable cores 
with increasing concentration, and blowup may occur in finite time. The blowup 
can be prevented by numerical diffusion. However in attempting to eliminate excess 
numerical diffusion for other reasons we may encounter vortex instabilities, and then 
the question becomes how much numerical diffusion is correct. This can be determined 
only by considering the full Navier-Stokes equations. 

4.3 WAKE CAPTURING 

We believe that if we can implement a good wake capturing scheme in TRAN AIR then 
we will be able to handle 85% to 95% of the cases that an Euler solver could handle 
with much less risk, development cost, and run cost. This is due to the fact that 
most inviscid problems of interest in full configuration analysis really involve regions 
of potential flow separated by vortex sheets. These regions may possess different 
total pressures and temperatures, but we have already demonstrated the ability of 
TRANAIR to account for such effects (see Section 5.2.6). It is true that shocks 
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Figure 4.1: W • V S = 0 (Good Upwind Discretization Scheme) Smooth Inflow 
Distribution 
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Figure 4.2: 1U* V 5 = 0 (Good Upwind Discretization Scheme) Discontinuous Inflow 
Distribution 
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Figure 4.3: W- V S = 0 Non-Diffusive Scheme, Every Value of Entropy Equal to 
Some Upstream Value. No Interpolation Allowed. 
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generate volume vorticity. This vorticity is generally negligible except for extremely 
strong shocks, hence in most cases the Hafez correction [77] can probably produce the 
correct shock strength. It is true that TRANAIR cannot model volume swirl effects, 
but such effects could be modeled by collecting the swirl into vortex sheets and then 
employing the wake capturing feature described below. Hence, it is clear that the 
addition of a wake capturing algorithm will make TRANAIR rival many Euler codes 
in capability, while offering the advantage of reliability, efficiency and the ability to 
analyze extremely complex geometries without a great deal of user effort. 

A variety of approaches for capturing wakes, short of solution of the full Euler 
equations, are possible. Unfortunately, many of them suffer from the same problem 
that was discussed in the previous section, i.e., excessive numerical diffusion arising 
from the discretization of convection equations. It is true that the adaptive grid 
capability might allow us to ignore the problem by concentrating sufficient grid in 
wakes to keep diffusion under control. However, a rough calculation shows that to 
convect vortex cores and sheets from the wing to tail with sufficient accuracy to be 
able to predict accurate tail loads would require at least triple the grid used to solve 
the problem with fixed wakes. The cost would currently be prohibitive. We have 
therefore been exploring compromises. 

There are many arguments to consider in developing a wake capturing algorithm, 
some of which have been mentioned above. We have finally arrived at what we be- 
lieve is a reasonable compromise between accuracy and user effort. This algorithm 
is based on a novel method developed and communicated to us by S. S. Desai[86] 
which combines vortex tracing methods with a non-linear full potential algorithm. 
The authors assume separation lines are specified and then emit discrete vortex fila- 
ments from these lines. These vortex filaments are aligned with the mean flow which 
is determined by combining the velocity induced by these vortices with the velocity 
computed on the full potential grid. This method often works quite well, but occa- 
sionally has a few problems. First the computation of the velocity induced by the 
vortex filaments is expensive, and second, this velocity is highly singular, resulting in 
vacuum conditions near each filament. 

We are currently analyzing several modifications to this method. First we note 
that vortex filaments are equivalent to the edges of constant strength doublet pan- 
els. By employing linearly varying doublet panels instead, the 1/r singularities can 
be eliminated. Moreover, by interpreting the doublet panels a s jumps in potential, 
one can take their influence into account through local jump conditions rather than 
through influence coefficients. The net effect of these two modifications is equivalent 
to a method whereby the positions of the current wake type-18 networks are updated 
so that each doublet panel side edge is aligned with the local mean flow direction. 
(We have checked that this condition is still applicable when the total pressure and 
temperature are different on the upper and lower sides of the doublet panels. Here 
one must calculate the mean flow direction using upper and lower mass flux vectors 
scaled by appropriate factors based on total pressure and temperature.) 

At the moment we are accounting for the effect of wake panels on the local flow 
by incorporating them in the local D-region operators. This limits the generality of 
wake shape by requiring that wake panels cannot cut themselves or any portion of the 


138 


vehicle boundary. Moreover, D-region operators must be recomputed every time the 
wake position is updated. It would be better to “capture” the discontinuities induced 
by the doublet panels by computing their contribution to the field operators on-the- 
fly, using methods similar to those used in capturing thin layers in EM TRANAIR[40]. 
Such a procedure would, for example, allow a wing wake sheet to cut a tail without 
having to separate the wake sheet into two pieces. 

The main advantage of the technique just described is that wake diffusion is virtu- 
ally eliminated. Moreover, one does not have the expense of adding extra unknowns 
or derivatives everywhere in the flow field just to account for the possible existence 
of a wake. In one sense the method could be called “wake fitting”. However the 
analogy to “shock fitting” doesn’t really hold. Only the separation line really needs 
to be specified by the user. This is reasonable since separation is basically a viscous 
phenomenon. Once the sheet gets started there is no problem associated with ’rec- 
ognizing’ a wake as there is in shock fitting. In fact the method is much more closely 
related to the Clebsch decomposition. Here the \i parameter is precisely the doublet 
strength in the wake and the gradient of the A lambda parameter corresponds to the 
normal vector of the doublet sheet. 


4.4 BOUNDARY LAYER 

In the vast majority of flow cases of practical interest the effects of viscosity are 
confined to a boundary layer next to the configuration surface. The influence of 
a boundary layer on the outer inviscid flow can be of major significance in some 
instances. One instance is where boundary layer separation produces a vortex sheet 
extending out into the inviscid flow field. Another instance is where shock-boundary 
layer interaction effects cause substantial thickening of the boundary layer and a 
correspondingly large modification of the effective configuration surface as seen by the 
outer inviscid flow. The latter effect is often of great importance in transonic analyses. 
In Figure 4.4 we show an analysis performed by Boeing’s A488 code[87] on the 747- 
200 wing at Mach 0.86 and at 2.70 degrees angle of attack. The code was run with 
and without boundary layer coupling and the results were compared to experiment. 
The coupled results are in much better agreement with the experimental results, and 
the primary effect of the boundary layer appears to be a weakening and upstream 
displacement of the normal shock. For some wings the effect is less pronounced, but 
one cannot know this without doing the actual boundary layer analysis. 

It would be a fairly straightforward task to couple the boundary layer code in 
A488 to TRANAIR. However, boundary layer codes often tend to be the weak link 
in a flow analysis. Transition models, turbulence models, and shock-boundary layer 
interaction models are all very ad-hoc in nature. There is not much that can be 
done about this. In addition, coupled transonic/boundary layer codes often have 
convergence problems due to the coupling itself. The input to most boundary layer 
codes is the inviscid pressure distribution and the attachment line. The boundary 
layer code proceeds in a marching fashion to generate boundary layer thickness for 
delivery back to the inviscid code, and the inviscid code generates a new solution. 
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This procedure is repeated until convergence, which is not always achieved. Moreover, 
premature separation can halt the boundary layer marching procedure. 

Due to the use of a sparse direct solver TRANAIR could be coupled to a boundary 
layer code directly. That is, the boundary layer equations could be treated in the same 
manner as the pressure equations (see Section B.3). Moreover, because of the direct 
nature of the solution, marching is no longer necessary (although the equations would 
be arranged in marching order to minimize fill-in). Hence, it would be possible to 
allow the introduction of elliptic terms, resulting in, e.g., the thin layer Navier-Stokes 
equations. 

4.5 DESIGN AND OPTIMIZATION 

TRANAIR currently has a rudimentary sequential inverse design capability. It allows 
the user to specify pressure coefficient at upper surface corner points of a thick surface 
network, or jump in pressure coefficient and thickness slope at corner points of a thin 
surface network. In the first case the network surface is to be relofted parallel to 
the upper surface mass flux vector. In the second case the surface is to be relofted 
parallel to the average mass flux vector. No relofting capability has been attached 
to TRANAIR, as such a capability is strongly case dependent and intimately tied to 
one’s geometry generation system. 

The procedure just described is similar to the cycled boundary layer coupling 
described in the previous subsection. It can often be effective, but is certainly not 
robust and may require intervention by an expert user. We again prefer a more direct 
approach based on the use of the sparse solver. In such an approach the parameters 
describing variations in geometry would be combined with flow unknowns and the 
whole system including pressure specification and impermeability conditions would 
be solved as a directly coupled system. 

Development of a directly coupled inverse design program would represent a major 
step towards a full optimization capability. Here an actual payoff function would 
be minimized with respect to a set of controls (geometry perturbation parameters), 
subject to inequality constraints on these controls as well as the state equations (full 
potential equation). 
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Chapter 5 

CONCLUSIONS 


A new approach to solving the full potential equation about arbitrary three-dimesional 
geometries has been presented. This approach has been implemented in a computer 
code called TRAN AIR. A wide variety of subsonic, transonic and supersonic results 
have been presented. They indicate that TRANAIR has made substantial progress 
towards the objective of offering aerospace vehicle designers a reliable, general pur- 
pose, full configuration flow analysis tool that is relatively easy to use. In particular, 
these results show that it is indeed possible to eliminate the costly and time consum- 
ing process of generating a surface fitted grid while maintaining the ability to capture 
small scale flow details accurately. Further work to improve TRANAIR and extend 
its domain of applicabilty has been discussed. 
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Appendix A 


OCT-TREE DATA 
STRUCTURES 

A.l DATA STRUCTURE ORGANIZATION 

A compact data structure which contains essentially all the information regarding the 
refined grid has been developed. It allows the TRANAIR code to concentrate many 
small boxes in areas where greater solution detail is needed and fewer and larger boxes 
in areas where less solution detail is desired. While usually refered to as ‘ the oct-tree', 
the data structure is actually a forest of oct-trees where each oct-tree root is a box in 
a uniform, regularly indexed grid. The data structure allows efficient extraction of a 
variety of information, such as the location of nodes and element centroids, box size, 
box level, node indices, box adjacency, and identity of boundary boxes. 


A.1.1 Base Grid 

The global grid (described in Section 2.3.3) specified over the computational domain 
is uniformly derefined to obtain the base grid. Each base grid box becomes the root 
of an oct-tree. All the descendent boxes of each base grid box physically lie within 
that base grid box. 


A. 1.2 Oct-Trees 

Each box in the data structure can be recursively subdivided (refined) into eight 
similar boxes. The hierarchy of boxes formed in this process is known as an oct-tree. 
The oct-tree data structure represents a parent-child relationship between a box and 
the sub-boxes formed by its subdivision and also the sibling relationship between the 
sub- boxes. 

Some restrictions are placed on the refinement to minimize data structure size and 
to simplify the problem. First, boxes are refined by subdivision into exactly eight 
equally sized sub-boxes. This greatly reduces the data structure size by eliminating 
the need to store box centroids and sizes. It also has the effect of keeping the aspect 
ratio of all boxes equal. Box centroids and sizes are derived from the box’s position in 
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the oct-tree hierarchy. Second, no two face or edge neighbors in a “legal” refined grid 
differ by more than one level, (see Figure A.l), which greatly reduces the solution 
computation complexity. 



Figure A.l: Grid “Legalization” Example. 

The basic oct-tree data structure is based on boxes. However, it has been extended 
to accommodate nodal information. A node is located at a corner point of one or more 
boxes. Nodes are indexed by assigning a box to the node at its lower-left-near corner. 
To account for all nodes at refinement interfaces, a pseudo-refinement is performed 
(Figure A. 2). Pseudo- refinement creates boxes (called pseudo-boxes ) that are assigned 
to nodes, but are not used as finite elements in the solution process. 


Pseudo 
Refinement 
to Identify 
this Node 



Figure A. 2: Pseudo-refinement to Represent the Nodes 
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A. 1.3 Terminology 

• B-Box (Base Grid Box): Any of the base grid boxes in the uniform, rrs;u!arlv 
indexed grid derived from the global grid. Base grid boxes are the root- of each 
of the oct-trees and have no parents. 

• O-box : Any box in the oct-tree. 

• R-Box (Red Box): Any unrefined box. An R-Box can contain a finite element 
trial function. 

• G-Box (Green Box): A pseudo-box created so that all nodes at refinement inter- 
faces are associated with a box Some G-boxes can lie outside the computational 
domain to define the nodes on the boundary of the computational domain. 

• U-Box : Any box (or pseudo box) whose lower-left-near corner is associated wit 
a node. 

• T-box: An R-box that intersects the boundary. 

A. 2 DATA STRUCTURE REPRESENTATION 

The data structure used in TRANAIR to describe oct-trees is a modification of t ha i 
described by Samet [57]. The data structure is divided into six areas: the header, t h • 
base grid descriptor, the refinement family, a stack, a boundary box map. and refine- 
ment pointers. A global overview of the data structure array is shown in Figure A. 3 

A. 2.1 Header 

The header area is fixed in size and location. It maintains a variety of information 
about the data structure including the locations of its various components and certain 
statistical information about the data structure and the grid. 

A. 2. 2 Base Grid Descriptors 

The header area is followed by a base grid descriptor area. This area is divided into 
two regions, the base grid pointers and the U-box accumulators for the base grid. The 
two regions are laid out as parallel arrays of the size of the base grid. Each element 
of the base grid pointer region identifies the location of a refinement family for a 
particular base grid box. A zero value here implies that the base grid box is not 
refined. The U-box accumulators are the number of U-boxes encountered in the data 
structure during a sequential traversal of the data structure. 
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Pointers 


Figure A. 3: The Overview of the Oct-tree Data Structure 
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A. 2. 3 Refinement Families 

The base grid descriptor area is followed by a series of refinement families. A refine- 
ment family describes a subdivision of a box into eight smaller boxes. A refinement 
family is shown in Figure A. 4. Each refinement family data block consists of five 
integer fields: 


Parent 

Refinement 

Pointers 

Number of 
Refined 
Children 

U-box 

Accumulator 

Octant 


Figure A. 4: A Refinement Family 


• The first field contains the address of the parent. The parent pointer is used to 
facilitate upward traversals in the tree. 

• The second field contains the location of a data block that describes the children 
of this refinement family. A zero value in this field indicates that none of the 
eight children are refined. 

• The third field describes the number of refined children contained in the block 
referenced by the second field. 

• The fourth field is the U-box accumulator. It describes the number of U-boxes 
encountered in the data structure during a sequential traversal of the data struc- 
ture. 

• The fifth field is the octant of this refinement family in the refinement of its 
parent. 

Because the third (number of refined children) and fifth (octant) fields contain 
small values, it is reasonable to compress these values into other data fields to conserve 
memory. As a result, only three integers are used to store the refinement families. 
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The first integer is equal to the first field. The second integer contains the second 
and third fields of the data structure. The values for these fields are given by: 

FIELD #2 = (INTEGER #2) / 16 
FIELD #3 = ABS ( (INTEGER #2) MOD 16) 

The third integer contains the fourth and fifth fields. The values for these fields 
are given by: 

FIELD #4 = (INTEGER #3) / 16 
FIELD #5 = ABS( (INTEGER #3) MOD 16) 

As boxes are refined, refinement families are appended to this list. This area grows 
towards the end of the total data structure. 

A. 2. 4 Scratch Stack 

The refinement family area is followed by a small scratch stack area that is used to 
record traversal paths. The stack area is not used until after all refinements have 
been made and the refinement family list has stopped growing. 

A-2,5 T-box Map 

The stack area is followed by a T-box mapping vector. The mapping describes which 
O-boxes have a non-empty intersection with the boundary. The map is constructed 
as an ordered (ascending) array of those O-boxes who intersect the boundary. The 
creation of this map is the final step in the generation of the oct-tree data structure. 

A. 2*6 Refinement Pointers 

The final area in the data structure contains the refinement pointers. The refinement 
pointers describe the addresses of the refinement families for the children of refinement 
families. This area is composed of variable size blocks. Each block is composed of a 
set of number pairs that describe the address of a refinement family, and the octant 
that refinement family lies in. An example refinement block is shown in Figure A. 5. 
Because the octant value is small, the octant and pointer values are stored together 
in one integer. The pointer and octant values are given from a data value as: 
POINTER * (Data Value) / 16 
OCTANT * ABS( (Data Value) MOD 16) 

Additionally, a back pointer is provided to facilitate oct-tree construction. The 
back pointer contains the address of the refinement family whose refinements are 
being described by this block. This pointer, like the others is stored with an octant 
value. Back pointers are assigned and octant value of zero to distinguish them from 
sibling refinement pointers. The back pointers are removed to conserve memory after 
the oct-tree is constructed. A flag in the header field indicates the presence of the 
back pointers. 

The refinement pointer area grows from the end of the data structure toward the 
refinement families. When insufficient space remains for growth the data structure 
size is enlarged within the code so that expansion can occur. 
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Figure A. 5: A Refinement Pointer Block 

A. 3 MAJOR ALGORITHMS 

This section describes the major algorithms used to manipulate and interrogate the 
oct-tree data structure. The description of the algorithms are sketches and are not 
intended to be exhaustive explanations. 

A. 3.1 Data Structure Modification 

Refining a Box 

When a given (unrefined) O-box is refined, a refinement family is added to the refine- 
ment families area of the data structure. The parent of the new refinement family is 
the O-box being refined. The octant of the box being refined is defined by: 

OCTANT * ((O-box - NXYZB - 1)) MOD 8+1 

where NXYZB is the number of base grid boxes. The refinement pointer block of the 
O-box’s parent refinement family is modified so that the OCTANT octant of the pointer 
block contains the address of the new refinement family. 

Creating the O-box/U-box Mapping 

The O-box/U-box mapping is created by storing in each refinement family an accu- 
mulation of the number of U-boxes encountered in the boxes defined by families that 
precede it in the list. 
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Creating the T-box/O-box Mapping 

The T-box/O-box mapping is implemented as an ordered (ascending) list of O-boxes. 

A. 3. 2 Data Structure Interrogation 

Finding Refinement Family 

The index of the refinement family defining an O-box is given by: 

FAMILY * ((O-BOX - NXYZB - 1 ) / 8) + 1 
where NXYZB is the number of base grid boxes. 

Determining Box Number From Refinement Family Index 

The O-box number of a refinement family can be found by: 

OBOX = NXYZB + 8 * FAMILY + OCTANT 

where NXYZB is the number of base grid boxes and OCTANT is the octant of this family 
in the refinement of its parent. 

Finding Box Centroid and Size 

Box centroids and sizes are calculated by ascending the oct-tree hierarchy to the 
base grid root. At each level reached in the hierarchy, the box centroid (initially 
( 0 , 0 , 0 )) is moved towards the parent box’s centroid by examining the current octant 
and number of levels traversed. When the base grid box is reached, the calculated 
centroid is translated to its center. The size of the box is equal to 
1/(2** LVL ) * BGSIZE 

where LVL is the number of levels traversed and BGSIZE is a vector describing the 
dimensions of the base grid box. 

Finding Child Boxes 

A given box has either eight or no children. If refinement family for the box exists 
then it has eight children. Otherwise, the box is unrefined (has no children). Child 
O-box numbers are given by: 

OBOX * NXYZB + 8 * FAMILY + OCTANT 

where NXYZB is the number of base grid boxes and OCTANT is number lying between 
one and eight. 
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Finding Neighboring Boxes 

Neighboring boxes are found by traversing up the oct-tree hierarchy until either an 
ancestor common to both the box and the neighboring box is found, see Figure A. 6. 
The common ancestor is found if the last parent in the pedigree is on the comple- 
mentary side of the child (i.e. if the south neighbor is desired then the parent should 
be on the north side of its child). A downward traversal along a path complementary 
to the upward path is performed. If the root is reached and the ancestor is yet to 
be found then the neighboring box lies in a different tree in the oct-tree forest. The 
neighboring tree is the base grid box lying in the desired direction. For a legal tree, 
there can be zero, one, or four neighbors in any face direction and zero, one, or two 
neighbors in any edge direction. In Figure A. 6 the traverse path for finding the north 
neighbor of the box 1 is shown as an illustration. 
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Mapping O-box and U-box Indices 

To find the U-box index of an O-box, the refinement family of the O-box must be 
found. If the O-box is refined and the refinement is not a pseudo-refinement, then it 
is not a U-box, otherwise, the U-box accumulator is recovered from the refinement 
family. The U-box accumulator is then decremented by one for each sibling whose 
octant is greater than that of the O-box, and who is not refined or who is a pseudo- 
refinement. 

The O-box index of a given U-box can be determined by performing a binary 
search on the (ordered) list of U-box accumulators to find the refinement family whose 
U-box accumulator is closest to that of the given U-box. The U-box accumulator 
is incremented by one for each child who is either unrefined or who is a pseudo- 
refinement until the U-box accumulator is the desired U-box. The O-box is the last 
child who caused the U-box accumulator to be incremented. 

Mapping O-box and T-box indices 

To find the O-box associated with a particular T-box index simply involves retrieval 
of the value from the T-box map. A binary search is performed to recover the T-box 
index given an O-box. 

Finding the U-Box Containing a Point 

To find the U-box containing a point in the computational volume, determine the 
base grid box containing the point. Traverse the oct-tree rooted at the base grid 
box by choosing, at each level, the child box that contains the point. The traversal 
terminates when a leaf node of a pseudo-refinement is found. 
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Appendix B 

OPERATOR DEFINITION 

In this appendix we describe how the various operators (discrete equations) in TRANAIR 
are defined and constructed. 

B.l IMPLEMENTATION 

In this section we discuss the way in which the Bateman principle is used to create 
operators at each grid point. The departure point is Eqn. (2.11) which is repeated 
here 

8J = [ -W ■ 8 V dCl 
J n 

We take fi as some subregion of the grid box shown in Figure B.l 
perturbation potential (/> as 

, ^ = ( 1/00 ,Voo ,Woo) (B.2) 

and then rewrite Eqn. (B.l) as 

8J = I -P (Voo + W) • 8V<t>dn (B.3) 

J Q 

We define p 0 as p(q Q 2 ) in the following equation 

1 

/>=/>. fl + ^Mi(l - 2|-) ^ (B.4) 

^ TOO . 

where q 0 is the value of q at the centroid of fi. We then make the approximation that 

6J » -Po f (Voo + W) • 8V<j>dn (B.5) 

J n 

This approximation is valid in incompressible flow where p is constant. If the basis 
function for ^ in fi is linear then V is constant in fi and Eqn. (B.5) is still valid. 
However the basis function for ((> is actually trilinear and strictly speaking V varies 
in fi, but because we evaluate p 0 at the centroid of fi, Eqn. (B.5) and (B.3) are the 
same to second order. 


(B.l) 

. First we define 
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The basis function for 4> in the grid box of Figure B.l is defined as 


<f> = fa + fax + 4> y y + <f> z z + fa y xy + 4> yz yz + 4> zx zx + <j> xyz xyz (B.6) 

where the origin for the x, y , z coordinate system is in the center of the box. Note 
that this expression has eight unknown coefficients. These coefficients are chosen so 
that <j> exactly fits the eight values 4 >\, fa, fa, $ 4 , fa, fa, fa, fa at the box corners. 
To express this fact in compact form we define 


c _ x V r _ z 

Ax ’ ^ Ay ’ Az 


(B.7) 


Note that for the box of Figure B.l, | £ |< j , | >? |< j , | C l< j- We rewrite Eqn. (B.6) 
as 


<i> = fa + fai + <t> n y + faC + 

where fa = Ax<j> x , etc... This can be rewritten as 

(B.8) 

<f> — b ■ u> 

where 

(B.9) 

and 

(B.10) 

U = (fa, 4>t, fa, fa, 4>r)C, fat, 4>ivc) 

But for some matrix R, 

(B.ll) 

UJ = Rt 

where 

(B.12) 

T = (fa, fa, fa, 4> 4, fa, fa, fa, fa) 

Thus 

(B. 13) 

II 

(B.14) 


The matrix R can be calculated by evaluating 4> of Eqn. (B.9) at each of the eight 
corner points to obtain 


t = Bu (B.15) 

Here B is an 8x8 matrix whose rows are b evaluated at each of the eight corner points. 
From Eqn. (B.12) R is the inverse of B. One can use a computer to invert B, but 
it is more satisfying to combine rows of B in a judicious manner to deduce R. For 
example, by adding all the rows of Eqn. (B.15) together, we get fa + fa + $3 + fa + 
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<t>s + <i>& + <t >7 + 4>% = 8 from which we deduce the first row of R. R is the following 
matrix. 


Defining 



(B. 16 ) 



u = <f> x , v = <t>y , w = 

(B.17) 

we have from Eqn. (B.14) that 



u = -1-Lrt 

Ax ' 

(B. 18) 


v & 

(B. 19) 


W = -Lk-Rr 

(B. 20 ) 

Here 

bt = (0 , 1 , 0 , 0 , 0 , C , V , r)0 

(B. 21 ) 


6 „ = ( 0 , 0 , 1 , 0 , C, 0 ,£, CO 

(B. 22 ) 


6 < = ( 0 , 0 , 0 , 1, 17 ,f , 0 ,^) 

(B.23) 

One can 

now rewrite Eqn. (B.5) as 



SJ « -^[aoo -Mf+f. tf r CMf] 

(B.24) 

where 

+ h vj - + h wj ^ dn 

(B.25) 

and 


(B.26) 
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One can rewrite Eqn. (B.25) as 


«oo = (0,1,0,0,0,C ,V,y£) 

+ %r (o,o,i,o,c,oJ,cT) 

+ ^ (0,0,0,1,7T,T,0,£?) (B.27) 

where Q is the volume of the region fi and the superscript bar denotes mean value, 

e-g-, 


Similarly we have 



c = 



+ 



+ 
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V 
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0 

0 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0 

0 

0 

0 


0 

c 

0 

0 

0 

c 7 


»?C 2 
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0 

0 

0 

0 

0 

0 
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v° 
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(n 
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■ ft 

I 0 
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(B.28) 


(B.29) 
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In general ft is a subdomain of the box bounded by polygons consisting of parts of 
subpanels and faces of the box. The mean quantities in Eqn. (B.27) and (B.29) can 
in general be evaluated only by using complicated recursions best programmed on 
the computer; see Section 2.3. However there are some special cases for which hand 
calculations are possible. Let us assume that ft is a small subregion of the box which 
is close to a corner, say corner 8. Then clearly the mean values on ft are almost the 
same as the values at corner 8, e.g., 

£=^=(=1,^=1,^=! (B.30) 

Another case of interest is when ft is a rectangular region, e.g., the whole box, the 
upper half, the lower tenth, etc. Let us assume that ft is defined by xt < x < x 2 , 
t/i < y < j /2 ) 21 < z < Z 2 - Then the term j s given by 




dft 


1 f fL-l M-V/V-l 

nJJ " c 
” c 


dft 


i [ X1 f L ~ 1 dx i I* 2 r) M ~ l dy - [ Z2 C N ~ 1 dz 

tfdxL * J»dyb ^ y L\ i dzJ Zl C - 

1 fMjt 1 C 1 _ M-l j 1 

r f 2 I si ®S rr }2 r / *7* r £ 2 / C* 

It d ( J *' /« dT > Jr " It d < Jil 


1 


( L r L M M /■ N /■ N 

6 —?1 1)2 ~ 1 } 1 C2 — Cl 


L - M ■ N 6-6 ih-Vi 


C2-C1 


(B.31) 


Case 1: 

Now assume that ft is the whole box, i.e., 



(B.32) 


Clearly if L, M or N is even, the mean value on the left of Eqn. (B.31) vanishes. The 
only non-zero mean values in Eqn. (B.27) and (B.29) are then 


C 2 = v 2 = C 2 


n 

1 

144 


(B.33) 


Case 2: 

Next assume that ft is only the upper half of the box. Then 6 = j , 6 = _ 2 » r h — J 
, t)\ = — 1 , C 2 = j j 6 = 0, If L is even or M is even the corresponding mean value 
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vanishes. However the same thing is not true for N. The non-zero mean values are 
now 


c 

= = W 


4 

l_ 

12 

J_ 

48 

1 

144 


(B.34) 


Let us go back to Case 1 above and assume Ax = Ay = A z — 1. Substituting 
Eqn. (B.33) into (B.27) and (B.29) we get 


and 


Hence, 


Goo = ( 0 ,t/oo, Voo, Woo, 0 , 0 , 0 , 0 ) 


/ooooooo 
0 1 0 0 0 0 0 
0 0 1 0 0 0 0 
0 0 0 1 0 0 0 
0 0 0 0 I 0 0 
0 0 0 0 0 L 0 
0 0 0 0 0 0 I 
ooooooo 


0 ^ 
0 
0 
0 
0 
0 
0 


(B.35) 


(B.36) 


6 J = p 0 [AJi + AJ 2 + AJ 3 + AJ 4 + AJ 5 4" AJ& + AJj + A«/g] (B.3 1 ) 

where 


AJi = <5<^i (+-j- -I — + 
4 4 

U V 

A J 2 = 6 M~+^ l + 

A J 3 = ~ + 

U V 

AJ 4 = 8<t> 4 { — f- + -7 1 - 

4 4 


Wco <f> x 64 </>6 <t>7 d>a x 

4 3 + 12 + 12 + 12 12 J 

Woo <t>2 , <f>3 , <t>5 . <f>7 <t >8 S. 

4 3 + 12 + 12 + 12 + 12 ; 

Woo 4> 2 <f>3 , <f>S , <f >6 ^8, 

4 + 12 3 + 12 + 12 + 12 } 

Woo <f> 1 <t>4 05 06 ^7, 

4 + 12 3 + 12 12 12 j 
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U V 

A J s = ^ 5 (+-f + -f - 
u V 

AJ 6 = 6M—r + -T- - 

4 4 

U V 

Aj r = 6M+^r--r- 

4 4 

U V 

4 4 


Woo . 02 . 03 . 04 05 ^8, 

4 12 12 12 “ T + l2 j 

Wx> 01. 03 04 06 07 x 

4 12 12 + 12 ~~ 3 + 12 j 

^OO 01 02 04 06 07 * 

4 12 + l2 + l2 + l2 _ y j 

^oc . 01_ . 02^ . 0^3 05. 08 X 

4 12 12 + 12 + 12 3 J 


The quantities inside the parentheses are the contribution of the box to the oper- 
ators at each of the eight corner points respectively. For a uniform grid, each corner 
point of the grid gets contributions to its operator from each of the eight surround- 
ing boxes. Consider the center grid point in Figure B.2. For the box in the upper 
right corner this point is local corner point number 1 and gets the contribution in 
Eqn. (B.37) from the coefficient of Sfa. However for the box in the lower left corner 
this point is local corner point number 8 and gets the contribution from the coefficient 
of <50 8 . Note that if p Q is different in each of the eight boxes then we must add the 
terms in parentheses weighted by each p 0 . However let us assume = 0 in which 
case p 0 = 1. Then when we add up all eight box contributions to the center grid 
point we get the coefficients shown in Figure B.2. 

Note that the terms in f/oo, all cancel out. This is due to the fact that 

V • 14c = 0. The operator coefficients in Figure B.2 are known as the Bateman 
Laplacian, which is a second-order accurate approximation to V 2 0. This is not the 
same as the more diagonally dominant finite difference Laplacian (which we call a 
“lumped” Laplacian) shown in Figure B.3. The 7-point Laplacian can be obtained 
from a finite element point of view in a variety of ways. One way is to add higher 
order terms to the trilinear basis function; see Section 2.3. Another way is to add 
higher order terms to the Bateman principle itself. In the latter method one adds to 
8J the term 


<5J « 



+ 

+ 

+ 


+ WS) + + U>) 

+ V, 2 )} 

_1_ Ax* Ax’ , 

36* 2 yi 2 * r 


(B.38) 


These “lumping” terms are negligible to second-order so that they do not affect 
the accuracy of the resultant operator. 
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Figure B.2: Bateman Laplace Coefficients 
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We leave it to the reader to show that the lumped formula analogous to Eqn. (B.37) 
is 

6J = /> 0 [AJj + A J 2 + A J 3 + AJ 4 + A J 5 + A J 6 + A J 7 -f AJ 8 ] (B.39) 

where 

AJi = £01 ( + — f/oo + + - Woo - — 01 + “02 + — 03 + -05 ) 

A h = ^(--t^oo + + - Woo - “01 - ^02 + |04 + ^06 ) 

AJ 3 = 8fa{ + -Uoo - -Voo + -Woo + ~4>l - - 0 3 + - 0 4 + ^ 7 ) 

A J\ — 8<j>A{—~Uoo — -Voo + J W» + -02 + -<f>3 — -<f>4 + “08 ) 

AJ 5 = <5d> 5 (+-f/ 00 + -Voo - - Woo + “01 - !* + ^-0 6 + ^0 7 ) 

A ^6 — 8<f>e( — —Uoo + — Wo — —Woo + — 4>2 + ^05 — ~ 06 + ~ 0s) 

AJ 7 = 84> 7 ( + -Uoo — -Voo — jWoo 4- — 0 3 + -05 — -07 + — 0 8 ) 

AJ 8 = 64>s(—-Uoo — -Voo — - Woo + -04 + -06 + -07 + -08 ) 

Note that one can derive the coefficients in Figure B.3 by adding the appropriate 
contributions from each of the eight surrounding boxes using Eqn. (B.39). 
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B.2 TEST CASE 


Now let us consider a test case, i.e., flow past a one panel thin wing at an angle- 
of-attack. For simplicity a uniformly refined grid is chosen. The geometry is shown 
in Figure B.4. The grid has the dimensions NX = 7, NY = 4, N Z = 4, and the 
y = 0 plane is considered to be a plane of symmetry. The grid points are numbered 
in increasing x, then y, then z order. The wing has a chord of 2.5 and a span of 
3. A wake panel is attached to the trailing edge. The wing plane is z = 0. In the 
bottom part of Figure B.4 we show the wing planform embedded in the grid, but 
actually there are no grid points at z = 0, and the points shown should be considered 
as projections. 

Any grid box containing any part of the boundary surface is called a T-box. There 
are 10 T-boxes in the example. Any connected portion of a T-box is called a D- 
region. There are 14 D-regions in this example. The D-region numbers are shown 
in Figure B.4. The ordering of the D-regions is somewhat arbitrary and depends 
on which T-box is processed first and also on the panel normal direction. (In this 
case the wing upper normal is in the +z direction and the wake upper normal is 
in the —2 direction.) Some T-boxes have one D-region and others have two. For 
example the inboard T-box at the wing leading edge has only one D-region (D-region 
1) because the wing does not fully cut the T-box in half. Consider D-region 4 as 
shown in detail in Figure B.5. This D-region has a trilinear basis function defined by 
Eqn. (B.14). However, the values of p at the corner points below the wing cannot 
be used in Eqn. (B.13) because the wing introduces a discontinuity in p. Instead the 
values of p below the wing are replaced by extrapolated values from above the wing. 
These values are denoted by ip. We introduce an extrapolated ip any time a <p is cut 
off from a neighboring <p in all four T-boxes adjoining the line segment connecting 
the two grid points. Note that we do not introduce a ip simply because the two grid 
points are separated in some T-box. In the thin wing problem of Figure B.4, eight 
ip's are introduced as shown. All lie on the plane of symmetry. The original problem 
contained unknowns p x through pn 2 ordered in grid order. The ip's are considered 
as additional unknown p's, i.e., ip x = pus and ip 2 = <£ 114 , etc. The corner unknowns 
for D-region 4 are shown in Figure B.5. Now we proceed to derive operators for peo 
and ip x . First we consider the contribution to these operators from D-region 4, i.e., a 
formula analogous to Eqn. (B.37). (Remember p\,p 2 , ■■■Ps are the local box values of 
p here and should not be confused with the global numbering for the problem.) 
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Figure B.4: One Panel Wing 
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One can use Eqn. (B.34) in Eqn. (B.29) and Eqn. (B.27) to obtain 

— Po[6Jx + <5*^2 + 6J3 4- & J 4 + SJ 5 + 8J 6 4- 6J 7 4- <5J 8 ] (B.40) 

where 


Mi =«*i(+tt + 
10 

6 J,=6<h(~ + 

io 


V W 

v oo . vv 00 

Tf> ~ 


16 




6J3 — 6<j> 3 (- 


U a 


16 

U a 


16 


SJ, = 6 M~-^ + 


8 

8 

Woo 


16 16 


8 




W f 


00 . 4 ^ 

8 + 24 




v ; 

16 ' 16 
U V 
6 J 7 = 6 <b{+—- — 
v 16 16 
// V 

<5J 8 = ^ 8 (-~- — 
v 16 16 


W 


00 , 0i 

8 + 24 


w t 


00 0 i 

8 24 
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00 . 01 

8 24 


0i 02 

12 _ 48 ~ 
01 02 
48 12 

01 03 
48 “ 12 ” 

02 <t>3 
48 ~ 48 _ 

, 03 04 

+ 24 + 24 

+ * + * + 

24 24 

02 04 

+ .Li + Ll _|_ 

24 24 

02 03 

24 24 + 


03 06 07 08 1 

48 24 24 + 24 J 

04 , 05 07 08 

48 24 + 24 + 24 

04 05 06 08\ 

48 24 + 24 + 24 

04 05 06 07 . 

12 24 24 + 24 

05 06 07 08. 

4 48 + 48 + 12 j 

05 _ 06 07 08 \ 

48 4 + 12 + 48 } 

05 06 _ 07 08 N 

48 + 12 4~ + 48 

05 06 07 08 1 

12 + 48 + 48 ~ T j 


(Note that fl, the volume of the D-region is j in this case.) 

Now "01 i s affected by D-regions 2 and 4. Because of the plane of symmetry one 
should also add in the contributions of the reflection of D-regions 2 and 4 across 
the y = 0 plane. For this purpose we simply reflect Figure B.5, i.e., in this case, 
interchange ip\ and 039, 02 and 040, 0 60 and 067, and 0 8 i and 0gs . Thus for the 
contribution of the unreflected D-region 4 to V’i, we choose the contribution due to 
<50! in Eqn. (B.40). However, for the contribution of the reflected D-region we choose 
the contribution due to <50 3 . Summing up all contributions, we get the operator shown 
in Figure B.6. For comparison the operators generated in the code are also tabulated. 
This printout shows the coefficients of the V>i operator ( IND0P=1 ) located at grid 
point 32 ( LOCOP=32) due to contributions from 2 physical boxes ( NPRINT=2 ). 
The freestream coefficients are the coefficients of U^, and W^, respectively. The 
remaining coefficients correspond to the unknowns indexed on the left, respectively. 
(The comparison between these formulas and those of the code are not exact, as the 
code shifts the wing in the grid by a slight amount in order to avoid panel surfaces 
landing right on grid lines. This shift also makes the 0 1 operator formula depend 
minutely on some other unknowns due to the fact that a small amount of the wake 
is shifted into D-regions 4 and 5. These minute quantities can be ignored). 

The Tpi operator has an interesting interpretation. Writing out the operator equa- 
tions from Figure B.6 we have 


24 0 31 3^ 24 12 ^ 39 + 12 ^ 59 


170 



7 8 



Figure B.5: D-region 4 
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(B.41) 


1 , 1 . 1 1 1 

+ — 061 + J^<Pm + g067 + —068 + “Woo = 0 


This equation can be rewritten in a more suggestive way as 


where 


etc... 


AxAyAz . 1 , _ . 

+ g(-^r 2 06O + T> y 2 06o) + 


n (^x 2 0i + D y 2 ^) 

12 AxAyD x 2 D y 2 <f>eo] = 0 


(B.42) 


D z + il>i 


D x 2 iP i 


D 2 ^ 


060 ~ 01 

Az 

02 ~ 201 + 0 31 
Ax 2 

039 ~ 20i -h 039 

Ay 2 
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Perturbation Coefficients 



<f>,tp Coefficients 


31 -0.41667E-01 32 -0.23582E-20 33 -0.22179E-15 

38 0..53951E-06 39 -0.83334E-01 40 0.53954E-06 

59 0.83335E-01 60 -0.84974E-06 61 0.83335E-01 

66 0.83336E-01 67 0.16667E-00 68 0.83335E-01 

113 -0.33334E-01 114 -0.41667E-01 117 -0.45336E-15 

118 -0.42667E-10 121 0.49778E-10 122 0.14222E-10 

Figure B. 6 : Operator Coefficients for t/>i = <^113 
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This is the way the equation would have looked had Acc ,Ay ,Az been arbitrary. 
Note that in the limit as Az — ► 0 the equation tends to D z + fa + W x = 0 which 
implies that the * velocity on the wing is zero, i.e., the wing is an impermeable surface 
Thus the fa equation effectively imposes the appropriate boundary condition To get 
the operator for fa 0 we note that it has contributions from D-regions 2 and 4 as well as 
the free space boxes 59 and 60 (all boxes are numbered according to the index of the 
lower left corner point.) For free space boxes (i.e., boxes not containing a boundary) 
we always use the lumped formula (B.39) so that away from the boundary we get 
the 7- point Laplacian, consistent with the discrete Green’s function. For D-regions 2 
and 4 we choose the appropriate contribution in Eqn. (B.40) and for boxes 59 and 60 
we choose the appropriate contribution from Eqn. (B.39). We must also include the 
contributions from the reflections of all four boxes, as they lie on a symmetry plane 
The operator coefficients for fa 0 are displayed in Figure B.7 and compare with those 
from TRANAIR shown, also in the same figure. 

For operators getting contributions from D-regions 6, 7, 8, 9, 13 and 14 the com- 
putation gets more complicated because the boundaries involve a wake. Specifically 

we have to add a surface integral to Eqn. (B.5) as in Eqn. (2.17) which is rewritten 
in the form 


J = J + J^a[h- W)(A$ - n) dZ (B.43) 

Taking a variation of the surface integral Eqn. (2.17) with respect to <£ and assuming 
incompressible flow we get ° 

^ ~ Po Iw alee ^ ~ ' ”) 

+ 2^ u ■ ” + ■ «)(/* + <t>L - <t>u) 

1 - 

+ 2 ” + V L • h)(6<f> L - 6<j>u)\ dE (B.44) 

Here h is the upper normal on the wake. (In this case h points in the -g direction.) 
Y u 1S evaluated using the basis function on the upper side of the wake and V L 
is V<f> evaluated using the basis function on the lower side of the wake. Note that 
upper and lower mean the same thing as in PAN AIR. In this particular case they 
do not physically correspond to upper and lower surfaces. The surface integral in 
Eqn. (B.44) further requires the definition of how /x varies in the wake. We assume 

5 J varies Unearl y from corn er Point to corner point. We leave it to the reader to 
determine the contributions to the operators due to Eqn. (B.44). 

Note that the fa operator involves unknowns on the other side of the wake. This 
is clear physically since the wake is a jump discontinuity surface rather than a solid 
surface. It is also clear mathematically from Eqn. (B.44) where terms involving 
products of upper and lower basis functions are present. The operators shown in 
Figures B.6 and B.6 involve the unknowns /x, = fa n and ft 2 = fa 22 , the values of 
doublet strength at the leading edge corner points of the wake network. and 
/x 2 have their own operators, namely ^ should be the difference between the basis 
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Figure B.7: Operator Coefficients for <f>eo 
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function of D- region 6 and that of D-region 7 evaluated at the inboard wake leading 
edge corner point. Similarly, fi 2 is the difference between the basis function of D- 
region 13 and that of D-region 13 at the outboard wake leading edge corner point. 
(Since there is no difference, = 0 as it should be.) 

The operator coefficients are displayed visually in Figures B.8 and B.9, and they 
have an interesting interpretation. 

In the same way as in Eqn. (B.42) the ^ and i> 7 operators can be rewritten 
suggestively in the form 

Ax A yAz 1 1 

2 * Az 2 ^ 62 - Vv + Vi) + 

+ j^Dx 2 4>3 + Y^D x 2 (f> 62 — -D x 2 ip 7 

+ gDy <f> 6 l + -Dy 2 (f> 6 2 + ~Dy 2 <j> 63 

~ Yg D y 2 ^e ~ ^D 2 ^ 7 - 2-£} y 2 rp s ] = 0 (B.45) 

AxAyAz r 1 / 1 

2 1 ^2(^34- fo-in) - -ZV>i 

+ 

+ ^>.V» + Id ,- + Id, V, 

+ \d,U 33 + + i D.Vas 

- 3 - ] = 0 (B.46) 

Adding Eqn. (B.45) to (B.46) and ignoring higher order terms we get 

AxAyAz 1 .<^62 — V*3 ^7 — <^34, 

2 ‘A z^~Al A^~ ] = ° (B ' 47 ^ 

In the limit as Az ► 0, this equation states that the z- velocity evaluated from the 
upper and lower basis functions is the same, i.e., the normal velocity is continuous 
across a wake. 

Subtracting Eqn. (B.45) from Eqn. (B.46) and ignoring higher order terms we get 

AxAyAz 2.<^3 4 + xfa <f> 62 + ifi 3 . 

2 ' Az 2 '- 2 2 /^i] = 0 (B.48) 

This equation states that the difference in potential between the two basis functions 
evaluated at the wake plane is equal to the wake doublet strength. We see then that 
the equations give us the proper jump conditions across the wake. 
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Figure B. 8 : Operator Coefficients for t /> 3 = <£ns 
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Figure B. 9 : Operator Coefficients for ip? = <p x 








As a final comment to this section we note that the lumping terms of Eqn. (B.38) 
can also be used in boxes which have D-regions to obtain a more diagonally dominant 
operator. However, for consistency all surface integrals in the D-region must also be 
lumped. As an example a surface term of the form 

8J « —p 0 J^(h ■ V)8<i>dY, (B.49) 

must be augmented by the term 

<5-/extra = -po f [ n x ^(A y 2 U y 6ct> y + Az 2 U z 8<f> z ) 

J E 0 

U yz&^yz) 

+ n y i(A z 2 VM 2 + Ax 2 VJ4> x ) 

+ riy —{Az 2 Ax 2 V zx 8<f> zx ) 

oo 

+ n z \{Ax 2 WM x ^Ay 2 W y 8<j>y) 

0 

+ n z -Ua x 2 Ay 2 W iy 8<t> X y) ]dS (B.50) 
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B.3 PRESSURE BOUNDARY CONDITION 


In this section we discuss implementation of the pressure boundary condition Eqn. (2.7) 
This boundary condition is generally imposed on wake sheets, which are wetted by 
the flow on both sides. Potential is allowed to jump across these sheets, and the 
value of this jump constitutes the degree of freedom which allows the imposition of 
Eqn. (2.7). Because TRANAIR has its roots in the panel method [8], we call the 
jump in potential across a wake ‘doublet strength’, and denote its value by /z. 

In order to discretize Eqn. (2.7) we define doublet strength at various point lo- 
cations on the wake sheets and then impose Eqn. (2.7) at a like number of discrete 
(but different locations). In Fig. B.10 we display a schematic of those wake surface 
discretizations which are operable in TRANAIR. These discretizations are comprised 
of networks of mesh points which are interpolated by the wake surfaces. The portion 
of the surfaces interpolating four adjacent mesh points is called a panel. In Fig. B.10 
the lines correspond to panel edges and their intersections to mesh points. The mesh 
points are identified by a rectangular array of indices, (1=1, M) and (J=1,N), where I 
is the row index and J the column index. The upper surface of the network is defined 
to be that side whose normal corresponds (in a schematic sense) to the direction 
N <g>M. It is useful to number the four network edges as shown in Fig. B.10. Discrete 
doublet parameters are located at positions denoted by the solid dot. The doublet 
strength on each panel is then obtained by bilinear interpolation. Pressure boundary 
conditions are imposed at the locations marked by x’s. Edge 1 (the leading edge) 
is considered a special edge for each type of network. It is assumed that the mean 
(average of upper and lower) tangential velocity enters the network along this edge. 
Since the pressure jump condition will involve only derivatives of doublet strength, 
constants of integration must be directly or indirectly specified along the leading edge 
to fix the level of doublet strength along each streamline. Boundary conditions along 
the leading edge are called Kutta conditions and their discrete locations are denoted 
by the open dot. 

Wake networks may abut other wake networks. In general, doublet strength must 
be continuous, so it is essential to ensure that the doublet parameters along the edge 
of one network match those along the edge of an adjoining network. Hence boundary 
conditions along edges of some networks may be replaced by explicit doublet matching 
conditions. 

The network type designations (6,18 and 20) arise from historical considerations 
involving the panel method [8]. Network type 6 is a full wake network, where the only 
assumption required for its employment is that the mean flow enters the network at 
the leading edge. Network type 18 is a special case of network type 6 where the doublet 
strength at any corner point is assumed equal to that of the first point in the respective 
column. This means that the derivative of doublet strength along panel column edges 
is zero. In certain instances this implies that Eqn. (2.7) is satisfied automatically at 
locations similar to the x’s for network type 6, which results in substantial savings. 
Generally one can employ type 18 networks in place of type 6 networks when the 
mean flow is roughly in the direction of panel column edges and the difference in 
total pressure and total temperature across the wake is not too great. Network type 
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20 is a special case of network type 18 where the doublet strengths along the leading 
edge are all identical to that at the head of the first column. Hence it is a constant 
strength doublet network and contains no vorticity (or jump in tangential velocity) 
at all. Network type 20 is generally used as a circulation carry-over wake rather than 
to account for vortex separation. This is opposed to network types 6 and 18, where 
the Kutta condition at the leading edges is intended to force vortex separation along 
the line where these edges adjoin a solid body. (Note in this connection that the the 
sheet vorticity vector £ is equal to h ® V//). 

The imposition of Eqn. (2.7) at the points denoted by x in Fig. B.10 is rela- 
tively straightforward, but requires some care. For example, upper surface velocity 
V u and lower surface velocity V; may be evaluated by differentiating the potential 
basis functions on respective sides of the wakes. Then upper surface pressure p u and 
lower surface pressure pi may be calculated from Eqn. (2.14), and substituted into 
Eqn. (2.7). Formally Eqn. (2.7) does not involve p , and p is determined through its 
appearance in the finite element operators (e.g. see Eqn. (B.45)). This leads to con- 
ditioning problems since the number of doublet unknowns is unrelated to the number 
of finite element equations. It is preferable to get p involved directly in Eqn. (2.7) 
through its definition as the jump in potential. For this purpose we redefine the upper 
and lower surface velocity vectors by the formulas: 


V 


j(K + M) + iv,. - 
j(K + v,) - iv„ - 


n ■ [V^ - (K - V5)j \ . 

^ A [ U 

n *n J 

* f 71 

n *n J 


(B.51) 

(B.52) 


Here, V u and Vi are the upper and lower surface velocity vectors calculated by dif- 
ferentiating the respective potential basis functions, but the (mathematically) equiv- 

— * * yr+ 

alent velocity vectors and V/ are used to compute p u and p(. The quantity n is 
the surface co-normal vector defined by 


J ((1 — M 00 2 )n x ,n y ,n z ) linear flow 
( n non-linear flow 

Note that 


(B.53) 


C = h ® (V' u - V,') = n® Vp, (B.54) 

i.e. the vorticity calculated from the velocities used in the pressure calculations is 
a function of p only rather than dependent on the potential basis functions. In the 
case where there are no total pressure or temperature differences across the wake, 
Eqn. (2.7) implies 


Hence, 



(B.55) 
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(B.56) 


aWAV' = 0, 

where aW = |(W U + Wi), AV =V' U — V/, and W u = p u V u , Wi = p t V t . 

Assuming the wake is a stream surface relative to the mean mass flux, i.e. 

h • aW = 0 (B.57) 

we obtain the classical condition for determining wake doublet strength, i.e., 

aW ■ V/z = 0 (B.58) 

Notice that if the panel column edges are aligned along aW , then p is constant along 
these edges and a type 18 network may be employed as mentioned above. 

In the case where there are total pressure and temperature differences across the 
wake, then Eqn. (B.58) should be replaced by 

aW m • V/x* = 0 (B.59) 

Here W * is defined by Eqn. (2.22). Although Eqn. (B.59) is no longer exact, it is 
often a reasonably good approximation. 

At Kutta condition locations we impose the following boundary condition: 

P~(4>u~ 4>i ) + et ■ Vp - V(j> u - </>,)] = 0 (B.60) 

Here t is a unit tangent vector lying along the local panel column edge, and e is 
a small parameter which is chosen to be approximately equal to the local field mesh 
size in the t direction. This equation guarantees that p is the jump in basis function 
potential in the case where wake panels are denser than the field grid. The derivative 
term is added in the case where the previous condition is redundant with respect 
to the finite element operators. Then Eqn. (B.60) implies that the jump in basis 
function velocity is well defined and finite. 

We conclude by noting that all the analysis contained in this section is applicable 
to the imposition of pressure boundary conditions on surfaces which are wetted by 
the fluid on one side only. Such a boundary condition is often used in the design 
mode, where pressure is specified and the surface is to be updated to be a stream 
surface in the resultant flow. To modify the analysis it is only necessary to replace 
the velocity V{ by the velocity which would yield the desired upper surface pressure. 
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Appendix C 
GMRES 


In this appendix the algorithmic details of GMRES are described. GMRES algorithm 
is used as the iterative driver to to drive the residual to zero. Attention is given to 
the concept of preconditioning and the role it plays in assuring rapid convergence. 
The advantages GMRES enjoys over related methods such as conjugate gradients are 
explored. 

C.l GMRES ALGORITHM 

GMRES [53] is a method for solving nonsymmetric linear systems of algebraic equa- 
tions. A modified version of GMRES which applies to nonlinear systems of equations 
is described. 

Consider a differentiable system 


A(u) = 0 


(C.l) 


of N nonlinear equations in N unknowns. The differential A(u; p ) of A at u in the 
direction p is defined by 


A(u;p) 


.. A(u + ep) - A(u) 
lim 

f — »0 £ 


(C.2) 


For computational purposes, £ is taken to be some small number and ensure that 
the variables u and the component values of A(u) are reasonably scaled to permit an 
accurate evaluation of A(u; p). 

Given u n , an approximate solution to Eqn. (C.l), one cycle of GMRES advances 
the solution by first choosing k orthonormal search directions pi,P 2 —Pk as follows: 


Pi = ~A{u n ) 


(C.3) 


Normalize p\ 


P i 



(C.4) 
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PRECEDING 


BLANK WOT FILMED 



For j = 1,2, ...k — 1 take 


i 

Pj + 1 = 4(u n ; Pj) - b :iPi (C.5) 

i=i 

where 


b ji — ( A( U \Pj)iPi) 


so that 


(Pj+i.Pi) = 0 for i = 1,2, ....j 


Normalize p J+1 


Now update u n using 


Pj+i = 


Pj+i 
I Pi+i II 


u n+1 = u n 


+ E a jPj 

J=l 


(C.6) 

(C.7) 


The overall goal is to minimize A(u n+1 ). To this end the coefficients a, are chosen to 
solve the linearized version of the least squares problem 


II 4(u n ) + E a ;4(u n ; Pi) || 2 ci || A(u n + £ a jPj ) || 2 

j=i j=i 

= II >4(“ n+ ') II 2 (C.8) 

Aided by the orthogonality of the search directions p J5 a modified version of the QR 
algorithm described in [53] is used to solve this least squares problem. 

One cycle of the GMRES algorithm is an approximation to one cycle of Newton’s 
iteration. Indeed with Newton’s method one uses the linear approximation 

A(u n+1 + p) ~ A(u n ) + A(u n ; p ) (C.9) 

to estimate a value of p which will enable A(u n -f p) = 0. Eqn. (C.9) suggests that 
the following linear equation be solved for p: 

A(u") + A(u n -p) =0 (C.10) 

The naive way to do this is to compute the entries of the N xN matrix associated with 
A and directly solve the system of linear equations. This is enormously expensive if N 
is at all large (as it is for all problems of practical interest). GMRES approximately 
solves Eqn. (C.10) by finding the best possible solution over the k dimensional linear 
subspace spanned by the search directions < Pi,P 2 , —Pk >■ Of course if k = N, then 
GMRES would find the best possible solution to Eqn. (C.10) over the whole space 
and would therefore compute the exact solution. Unfortunately this is every bit as 
expensive as solving Eqn. (C.10) directly. The key to efficiency is to arrange for 
GMRES to find a good solution to Eqn. (C.10) using only a small number of search 
directions k. Preconditioning plays a vital role in achieving this goal. 
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C.2 PRECONDITIONING 


The available mathematical theorems [53], [88], as well as much numerical experience, 
indicate that the rate at which GMRES converges, measured by the value of k required 
to achieve a given level of accuracy, depends on the distribution of eigenvalues. The 
more the eigenvalues are clustered together, the faster GMRES will converge. The 
process of replacing a given problem with another equivalent problem (i.e., one with 
the same solution) enjoying a more favorable distribution of eigenvalues is called 
preconditioning. 

Based on the observation that the identity operator has the most favorable dis- 
tribution of eigenvalues (all the eigenvalues are clustered at 1), most methods for 
preconditioning invoke an approximate inverse to the operator in the equation one is 
solving. For example, if L is a linear operator and N~ x is an approximate inverse to 
L then the problem 

L(x) — 6 = 0 (C.ll) 

is equivalent to 

N- 1 {L(x)-b)= 0. (C.12) 

In this case N is called a preconditioner for L. These equations have the same 
solution, but GMRES will more readily solve Eqn. (C.12) than Eqn. (C.ll) because 
the eigenvalues of N~ X L are more tightly clustered than those of L. 

A formulation which allows GMRES to take advantage of preconditioners already 
built into existing codes will now be described. 

Given a problem of the form 


A(u)=0 (C.13) 

most computer codes have a method M which takes a good approximation to the 
solution of Eqn. (C.13) and creates a better approximation. Typical methods M 
might, for example, involve SLOR, ADI, a time marching scheme or even multigrid. 
Whatever the method, M already invokes an approximate inverse to the operator in 
Eqn. (C.13). The standard procedure for updating u is 

u n+1 = M(u") (C.14) 

Convergence is achieved when u n+x = u n . Thus solving Eqn. (C.13) is equivalent to 
solving 

u - M(u) = 0 (C.15) 

Applying GMRES to the preconditioned Eqn. (C.15) is more effective than applying 
GMRES to Eqn. (C.13) directly. Applying GMRES to Eqn. (C.15) is often consid- 
erably more effective than employing the standard iteration procedure Eqn. (C.14) 
[54]. 
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C.3 GMRES AND SIMILAR METHODS 


As shown above, one cycle of GMRES finds the best possible solution to the following 
linear equation for p: 


A(u n ) + A(u n ;p) = 0 (C.16) 

over a k dimensional subspace < Pi,P 2 ,---Pk >■ In GMRES the search directions 
are chosen to be orthonormal. An alternative method ORTHODIR, chooses the search 
directions to be AA T orthogonal, i.e., 

(A(u n -p t ),A{u n ;p J )) = 0, i^j (C.17) 

ORTHODIR computes the k search directions as: 


Pi = -A(u") 

for j = 1,2 ,...k — 1 take 

Vj = A(u n ;pj) 


pj+i = v j + J2 b aPi 

t-l 

where 

b _ (A(u n ;vj), A(u n ;p,)) 

3 ' (A(u n -,pi),A(u n ;pi)) 

The coefficients bji are computed to enforce Eqn. (C.17). As in GMRES, ORTHODIR 
updates u using the formula 

u "+> = «* + £ a,p, 

J=1 

where the a : are chosen to solve the linearized least squares problem 


II 4«*)+E «»*(“"; w) II s 


J = 1 


=* IM(«’ + E«iw)ll J 

i= l 

= M(““ +I )II J 


(C.18) 


Now because of Eqn. (C.17) this least squares problem can be explicitly solved: 




It can be easily shown that the search directions generated by ORTHODIR span 
the same k dimensional subspace as those generated by GMRES. Therefore GMRES 



and ORTHODIR are mathematically equivalent. Since ORTHODIR solves the least 
squares problem more conveniently, ORTHODIR would appear to be preferable to 
GMRES However, ORTHODIR requires that both the search directions P] and the 
vectors A{u n ; Pj ) be stored. Also ORTHODIR uses 2 k evaluations of A. GMRES on 
the other hand requires only k + 1 evaluations of A and that only the Pj be stored 
(The solution to the least squares problenn_Eqn. (C.8) presented in reference [53] 
makes use of Eqn. (C.5) which expresses A(u n ] P: ) as a linear combination of the 
search directions Pj , eliminating the need to explicitly store the vectors A{u n ; Pj )). 
Thus GMRES requires only about half the storage and half the number of function 
evaluations as ORTHODIR. Moreover, evidence is given in reference [53] that GMRES 
is less subject to numerical problems. Of all the methods for solving a nonsymmetnc 
linear system of equations based on the idea of finding the best possible solution over a 
k dimensional subspace (ORTHOMIN, ORTHORES and ORTHODIR are compared 
in reference [53]), GMRES appears to be the best in terms of storage, operation count 
and numerical stability. 

For problems involving symmetric positive definite matrices, algorithms more em- 
cient than GMRES exist. Consider the linear system 

Si = b (C-19) 

where S is a symmetric positive definite matrix. The CR (Conjugate Residual) 
method employs the following relations [88]. 

Choose: xo 

Set: ro = b — Sxo 

and po = r o 

Now recursively use 


a x 

= {ri,Sr t )/(S P i,Spi) 

*+i 

= x, + a, pi 

r*+i 

= r { — a,S P i 

L 

(»Vn,Sr <+1 ) 


X- V 

t-T 

Co 

t-T 

II 

P*‘+ 1 

= r, + i + b iPi 


It then follows [88] that 

(Sp„Spj) = 0 for i#j 

and that x& minimizes 

II Sx - b || ! 

over the k dimensional affine space x 0 + < Po, Pi, - Pfc-i > 
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Thus CR operates very much in the same spirit as GMRES (in fact for symmet- 
ric positive definite linear systems they are mathematically equivalent), but explicit 
orthogonalizations such as Eqn. (C.5) and the need for solving the least squares prob- 
lem (C.8) are avoided. Moreover storage is required for only the 5 vectors x,r, Sr, p. 
and Sp. This compares very favorably with the fc + 4 stored vectors required GMRES 
(typically 20 vectors). 

Another algorithm for solving Eqn. (C.19) is Conjugate Gradients (CG). It uses 
the relations [88]: 

Choose xq. Set 


and 

r 0 = b — Sx o 

(C.20) 

Now recursively use 

Po = r 0 

(0.21 ) 


a - < r " r ') 

( Pi , Spi) 

x,+i = x,- + a,p, 

r i+ 1 = - diSpi 

u _ ( r *+i> r .+i) 

( r n r t ) 

Pi+ 1 = *~.+l - b,pi 


It then follows that [88] 

(Pi) Spj ) = 0 for i ^ j 



so that the search directions are (by definition) S orthogonal (conjugate) to each 
other. Also if y is the exact solution to Eqn. (C.19) then x k minimizes [88]: 


II y- x || 

over the k dimensional affine space x 0 + < po,Pi, ■■■Pk-i >■ 

Again CG enjoys enormous advantages over GMRES in terms of storage and opera- 
tion count when applied to symmetric positive definite linear systems. Unfortunately 
many of the advantages of CR and CG disappear when applied to nonsymmetric 
problems. Let G be a general (invertible) nonsymmetric matrix. Then 

Gx = b (C.22) 

can be solved with CG or CR if one considers 

G'Gx = G*b (C.23) 

This involves the added expense and inconvenience of computing the adjoint op- 
erator G*. The adjoints of any preconditioners must also be computed. This could 


190 


be nearly impossible if multigrid is used as a preconditioner. The elegant formula- 
tion shown in Eqn. (C.15) which allows GMRES to be immediately retro- fitted to 
existing codes is lost. Also the eigenvalues associated with Eqn. (C.23) are more 
spread out than those of Eqn. (C.22). Indeed the eigenvalues of G*G are the squares 
of those of G. This means that more iterations are required to solve Eqn. (C.23) 
than to solve Eqn. (C.22). In fact conjugate gradient applied to Eqn. (C.23) with a 
basic underlying iterative method as preconditioner is often no faster than the basic 
underlying method, making acceleration of Eqn. (C.23) with any Krylov subspace 
method a hopeless cause. In view of the difficulties associated with applying CG or 
CR to Eqn. (C.23) one must ask how much does it really cost to apply GMRES to 
Eqn. (C.22) directly. In a typical application, the cost of GMRES turns out not to 
be too burdensome. The operations required by GMRES are fully vectorizable over 
vectors of length equal to the number of unknowns in the problem and are therefore 
capable of efficient implementation. For these reasons it is believed that in most cases 
applying GMRES to Eqn. (C.22) is preferable to using CG or CR on Eqn. (C.23). 
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Appendix D 

POISSON SOLVER 


D.l SUMMARY OF THE POISSON SOLVER 

The Poisson solver, denoted by T~\ computes the perturbation potential <f> from a 
given source function Q in a manner that automatically imposes the proper boundary 
condition at infinity. The computation of T *(Q) is based on the discrete convolution, 
G*Q, of the sources Q with the discrete free space Green’s function G. To get T<t> = Q, 
the function G must satisfy 


(TG)(i,j,k) = 6(i,j t k) 


' 1 at (0,0,0) 

0 everywhere else 


(D.l) 


In addition G must satisfy a discrete far field boundary condition. It is sufficient to 
require that G be asymptotic to the continuous free space Green’s function - 1/(4 rr) 
as r _► oo . In fact G may be approximated to arbitrary accuracy as r -mx by an 
asymptotic expansion of the form 


G{i,j, k) ~ 


A i A»A £ /l + ^ + ^ + ...-) 
47t Vr r 3 r 5 / 


(D.2) 


Here r] n = r; n (u, v, w, Ax, Ay, A z) where 




iAx jAy kAz\ 

J 1 J 

r r r ) 


and r = ((xAz, jAy,kAz)\. 


(D.3) 


A general recursion formula (D.42) for the asymptotic coefficients rj n was derived 
after considerable effort. James’ attempt [89] to derive the r) 3 formula was consistent 
with the Poisson operator but not with the “recursion” relations (D.ll) that G must 
satisfy. The MIT computer algebra program MACSYMA was helpful in evaluating 
r} 5 and the corresponding downstream Green’s function calculations. 

The Poisson solver permits symmetry about the y and z planes j = 0 and k = 0 
and automatically includes downstream sources. These are the sources on the down- 
stream (m x ) plane of the computational box R. They are interpreted as extending to 
infinity in the x direction, so that wakes are automatically extended outside of R. A 
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corresponding downstream Green’s function G d , see (D.4) below, is computed along 
with the regular Green’s function G. A correct solution requires that the downstream 
sources sum to zero; otherwise the solution is theoretically infinite everywhere, though 
in practice small deviations from the zero sum are tolerable. 

The James algorithm, see section D.1.2 below, is used for the convolution. For 
an N 3 box, the real operation count of the TRANAIR implementation is (116 + 
10 log 2 N)N 3 + 0(log 2 N)N 2 . In practical cases the O(log 2 N)N 3 asymptotic term 
is dominated by the large 0(N 3 ) term, and the 0( log 2 N)N 2 term is significant in 
small cases. As the result of a considerable programming effort, the code is typically 2 
times faster than a comparable implementation of the standard convolution algorithm, 
whose operation count is (72 + 35 log 2 N)N 3 . The Poisson solver achieved a rate of 
480,000 grid points per second, or about 85 MFLOPS, on one processor of a Cray X- 
MP for a grid with dimensions (m x ,m y ,m z ) = (160, 150,27). 

The memory requirement of the Poisson solver (including Green’s function) is 
N 3 -f- 51iV 2 , which is N 3 asymptotically but closer to 2N 3 in practice. This is due 
to the many scratch planes required for good vectorization of the dominant phase 2 
of the algorithm. An earlier code, which was closer to James’ technique, used much 
less memory in phase 2 but was much slower and more complicated. The standard 
algorithm uses 2N 3 + 237V 2 words, which is asymptotically twice as much but is less 
than 1/3 more in typical cases. 

Radix 2,3,5 FFT’s (Fast Fourier Transforms) are used to achieve more flexibility 
in choosing (m„ m„,m,) than the traditional radix 2 FFT, permitting a smaller 
computational box. 

D.1.1 Summary of the Green’s Function Algorithm 

Buneman [90] found an analytical method for generating the 2D discrete Green’s 
function in the case Ax = Ay. For the 3D case a new, semi-analytical method has 
been developed for arbitrary Ax,Ay,Az. This method may be adapted to 2D and 
to higher dimensions as well. 

The basic idea is to first compute Green’s function data on the boundary of the 
box R and then solve for G in the interior by FFT techniques for a Poisson boundary 
value problem (see section D.3.1). Neumann boundary value data are used because 
a Neumann boundary value problem is solved by cosine transforms. For data that 
is symmetric about the origin on each axis, as it is for (7, cosine transforms give the 
DFT of the solution. It is actually the DFT of G, denoted by G, that is needed for 
the convolution. Thus by solving a Neumann boundary value problem, G itself need 
never be computed, only G. 

Symmetry determines the Neumann boundary data to be zero at the three bound- 
ary planes of R through the origin: i = 0, j = 0, and it = 0. The hard part is, of 
course, the boundary data for the three exterior planes i = m x , j = m y , k = m 2 . 
The boundary data for the first of these planes, for example, must be computed as 
(G(m x + 1 ,j,k) - G(m x - 1, j, fc))/(2Ax), cosine transformed in j and k. If m x is 
large, this may be computed accurately by the asymptotic expansion (D.2). For this 
reason the problem is first reformulated, if necessary, for a large box, whose dimen- 
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sions depend on the available scratch memory and on an attempt to equalize the 
radial dimensions m, Ax, m y Ay, and m z Az. Then this large box Neumann problem 
is partially solved to get the desired boundary data for the given box R. 

The total CPU time for computing the Green’s function data is less than .2 seconds 
using 100,000 words of scratch memory, corresponding to an operation count of 18 
times the volume of the large box. This time is trivial since this data is computed 
only once and stored, to be read back at each subsequent call to the convolution 
algorithm. The data is accurate to about 9 or 10 significant digits, degrading only at 
high ratios of the deltas. The convolution algorithm preserves this accuracy. 

The downstream Green’s function G d is defined by 

CO 

Gj(i,j,k)= £ G(l-i,j,k). (D.4) 

l=m z 

Formula (D.4) is derived from the requirement that if Q(i,j,k ) = 0 for i < m x and 
Q(i,j, k) = r(j, k) for i > m x , then 

( G * k) = (G<i(vr) * r)(j,fc). 

The portion of each sum (D.4) outside the large box is computed by the Euler- 
Maclaurin formula for an infinite sum, with G evaluated by the expansion (D.2). 
Each such sum actually has an infinite part (see section D.2. 4). But this part may 
be subtracted off because it is asymptotically equal for all (j, k) and will disappear 
upon convolution with the downstream sources, provided that they sum to zero. 

The result of solving the Neumann problem for G is a simple algebraic formula in 
terms of three planes. These are the cosine transforms of the three exterior planes of 
boundary data (see section D.2. 3). For G d , four planes are required since there is no 
symmetry in the x direction. Evaluation of these formulas typically adds about 5% 
to the CPU time of a convolution if there is no symmetry but saves two grid boxes 
of memory. If there is both y and z symmetry, the cost may rise to 25% of the CPU 
time but save eight grid boxes of memory. 

D.1.2 Summary of the Convolution Algorithm 

The procedure developed by R. A. James [89] is based on the decomposition of the 
perturbation potential <f> into an “interior solution” 9 and an “exterior solution” 0, 
mediated by a “shielding charge” function a. That is, write the Green’s function 
solution <f> = G * Q to the Poisson equation T<j> = Q in the form 

4 > — 9 + 

where 9 and rp are defined by 


T9 

— 

Q 

inside the box R with 

9 


0 

on the boundary dR and outside i?, and 

ip 

= 

G * 

a for 

a 

= 

Q- 

• T9 



0 

only on dR. 
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Intuitively, the interior solution 0 would result if the sources in R were shielded from 
everything outside by a wall at dR, for example. Subtracting <7 from the given sources 
Q on dR gives induced sources, or charges, TO on OR. By definition these charges 
have the same effect as a wall, hence the name shielding charge function. The exterior 
solution is so called because it incorporates the “exterior”, or free-space, boundary 
condition. 

A mathematical proof for the correctness of this decomposition is given by the 
following algebraic argument. Let 9 and Q be functions defined on the 3D grid that are 
zero outside R and define <7 — Q — T9 and ip = G * < 7 . Then the discrete convolutions 
G * Q and G * a are finite sums, and it is easy to verify that G * (TO) = T(G * 9 ). 
Hence 

<p = G*Q = G* (T9) + G*a = T(G*9) + xp = 9 + xp. (D.5) 

This algorithm involves three phases. For phase 1, solve T9 = Q inside R given 
9 = 0 on dR and calculate <7 = Q — T9, which is nonzero only on dR. For phase 2, 
compute ip = G * a on dR by the standard convolution algorithm (described below). 
For phase 3, complete the computation of xp by solving another Dirchlet Poisson 
problem: Tip = 0 inside R given ip on dR. 

The efficiency of this algorithm comes in part from the fact that phase 2, G * c r, 
is much less work than G * Q if done intelligently. Likewise, in phase 3 it is possible 
to take advantage of the zero interior sources. There is also a way to do the last 
part of the Dirichlet Poisson algorithm only once, instead of twice. Finally, the 
Dirichlet procedure is very efficient since it uses FFT sine transforms and a special 
tridiagonal solver (section D.3.1). Where there is a plane of symmetry, a mixed 
Dirichlet- Neumann problem results. This requires the use of “shifted sine or cosine” 
FFT transform algorithms to replace the sine transform (section D.3.2). 

The standard FFT convolution procedure for G * Q would be to apply FFT’s in 
the x, y, and z directions to G and Q, multiply these complex results pointwise, then 
inverse transform. For this to be correct, both G and Q must be doubled in size along 
each axis, and this extension must be zero filled, with FFT’s for these doubled lengths. 
The reason for this is that the FFT actually implements a “circular convolution” [91]. 
The zeros guarantee that this convolution equals the standard linear convolution. 

The standard algorithm is adapted to phase 2 of the James algorithm in the fol- 
lowing way. Since <7 is restricted to dR, only the 6 boundary planes of R need be 
transformed directly. Then a is constructed from these 6 planes, an xy plane (com- 
plex and quadrupled in size) at a time, along with the corresponding plane of G. The 
values in these two planes are multiplied, and summations specified by the inverse 
transform formula are performed to get 6 output boundary planes. At the end, the 
inverses of the initial transformations are applied to these 6 planes. See section D.3.3 
for explicit formulas for handling the 6 planes. 
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D.2 THEORY OF GREEN’S FUNCTION AL- 
GORITHM 


D.2.1 The Green’s Function Definition 

The Green’s function G must satisfy the discrete Poisson equation 


D 2 G(i,j,k) = S(i, j,k) 


(D.6) 


where 


D 2 G(i, h k) = 


+ 

+ 


G(i + 1 ,j,k) — 2 G(i,j, k) + G{i — 1, j, k) 
/\ 

G(i,j + 1 ,k) -2G(iJ,k) + G(i,j -J .Je) 

Ay 2 

G{i,j, k + 1) - 2 G(i,j,k) + G{i, j, k - 1) 
Az 2 


In addition we assume that G satisfies a free space boundary condition of the form 


G(iJ,k) = K 



(D.7) 


where K is a constant to be determined, (x,y, z) = (iAx, jAy, kAz), r — |(x,y,z)|, 
and / has continuous partial derivatives with asymptotic condition r 2 V / ~ 0 when 
r ~ oo. Here “x ~ 0” means that x is infinitesimal (in the sense of nonstandard 
analysis [92]), “x ~ oo” that x -1 ~ 0, and “x ~ y ” that x — y ~ 0. 

Mathematically it is not clear that a solution to (D.6) and (D.7) even exists, or if 
one exists, that it is unique. Therefore it is better to define G in a way that makes 
existence obvious, and then derive (D.6), (D.7), and show uniqueness. The form of 
the free space condition (D.7) permits a straightforward uniqueness proof (later in 
this section). 

First define the discrete Green’s function as the coefficients of a certain 3D Fourier 


senes: 


= hU-J- 


x exp{(ia + j/3 + £7)1} 

M a i 0i 7) 


da d0 d'f 


(D-8) 


where 


/i(a,/3,7) = - — sin 2 (a/2) - sin 2 (/?/2) - — sin 2 (7/2) 


Ay 2 


Az 2 


(D.9) 


The function is singular at the origin, but this singularity is integrable by changing 
to spherical coordinates since the Jacobian is 0(p 2 ) and h(a,/3, 7) = 0(p 2 ) for 

p ~ 0 + . Therefore, according to Zygmund[93], the 3D Fourier series with coefficients 
G converges to h~ x in the L 1 norm, and also almost everywhere in, for example, the 
following form of “Abel convergence” 


h ‘(a,/?, 7) 


lim G(i , j, k ) r |,|+|j|+|fc| . 


(D.10) 
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That the definition (D.8) satisfies the Poisson equation (D.6) is easily verified by 
calculations such as the following. 

exp (j + l)/ 3 i — 2 exp j/ 3 i + exp(j — 1 )/ 3 i 

= (exp pi + exp (-/ 3 i) - 2 ) exp j/ 3 i 
= 2(cos(/?) — l)expj/ 3 i 
= — 4 sin 2 (/?/ 2 ) expj( 3 i. 

Other important properties of (D.8) are the recursion relations: 

G(i + l,j,k)-G{i-l,j,k) = G(i,j + l,k)-G(iJ-l,k) 

iAx 2 jAy 2 

(D.ll) 

_ + 1) - G(i,j, k - 1) 

kAz 2 

These are verified by using integration by parts to reduce each of the three differences 
to a common value. For example, the first difference in (D.ll) may be integrated by 
parts in a to give 

8^ Jl £. /_" e (,a+J/J+ ^ )i da d /3 d 1 


since exp(i + l)ai — exp(i — l)ai = 2i sin(a) exp iai = — Ax 2 ih a exp iai from (D. 9 ). 
These relations are first order finite difference analogs of the differential relations 




If the definition (D.8) is interpreted as an iterated integral, one of the three in- 
tegrals may be done analytically. For example, suppose that k > 0 and do the 7 
integral by writing it as a contour integral along the infinite rectangle in the upper 
half complex plane from — x + 00 i to — x to +x to +x -4- 00 i and back. Except for 
01 — j 3 — 0 , there will be exactly one pole inside the contour. This is at the point 7 0 i 
which satisfies h(a, 0 , 7oi) = 0 . Its residue is given by 

1 A z 2 i 


h-,{oL,/ 3 , 7oi) 2sinh(7o) 

where (D. 9 ) shows that 70 = 70(0, ( 3 ) is defined by 

Az 2 A z 2 

sinh 2 (7o(a,/?)/2) = — sin 2 (a/2) + — sin 2 (/ 3 / 2 ). 

Therefore the triple integral in (D.8) may be reduced to the double integral 

A z 2 r r exp{(m + j/ 3 )i — kj 0 (a,/ 3 )} 


(D. 12) 


- -srjr 


■ da d /3 


8 n 2 J-TrJ-ir sinh(7o(a,^)) 

AxAyAz r T /^ x r*/&y exp{((uy + vip)i — u;£)r} 


f D. 13 ) 


8x 2 


r 741 r 

J — 7r/Ax J - 


tt/Ax J — 7r/Ay 


sinh(7)/Az 


d\ dip 
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where (u, v, w) = ( x/r , y/r, zfr), 7 = 7 0 (a,/?) , and (x,^,C) = (a/ Ax, (3/ Ay,~f/ Az) 
are normalized coordinates. Numerical integration of (D.13) is possible, but expensive 
for large i, j, or fc. 

That the free space condition (D.7) is satisfied by definition (D.8) follows from 
the development of the asymptotic expansion of (D.13) later in section D.2.2. It is 
already obvious from (D.8) that G is infinitely differentiable as a function of x,y,z 
so that the function / in (D.7) has the required smoothness. 

Now let us present the uniqueness proof for condition (D.7). The proof is based on 
the discrete divergence theorem. Let g = (gi, <72 » <73 ) be a function defined on a grid 
box R = [b x , e x \ x [b y , e y ] x [6*, e*]. Also let D f = (D/i, Df 2 , D f3 ) denote the forward 
difference operators in x, y,z; e.g., (D/i)gi(i,j, k ) = (</i(i + 1 ,j,k) - gi(i,j, k ))/ Ax . 
Then just do all possible cancellations to get 


53 (Df • g)(t, j, k) AxAyAz = 

( 9 i( e rJ,k) - gi(b x + 1, j,k)) AyAz 

jteiq, 

+ 13 - g 2 (i,b y + l,k)) AxAz 

«,*€ft° z 

+ MiJ*Cx)-93(i,j,b M + l))AxAy 


where R° = [b x + 1, e x - 1] x [b y + 1, e„ - 1] x [ b z + l,e z - 1], etc. This statement 
may be abbreviated by letting AS represent the boundary deltas, A R = AxAyAz, 
and n = the outward unit boundary normal vector, to get 


J2D r gAR = '£g-nAS. (D.14) 

R? 9R9 

Now let (e x ,e y ,e z ) = ~(b x ,b y ,b z ) = (n,n,n) and apply (D.14) to the backward 
difference operators D^G = (Dm,A, 2 , Dbz)G, defined by (DbiG)(i, j, k) = (G(i, j, k) — 
G(i — l,j,k))/Ax, etc. By (D.6) and (D.14) the result is 


A R = y\ D 2 G AR = T D b G • n AS 

RP dR? 


= 8 


t $ 

n — 1 n— 1 


53 51 (Db\G)(n, j,k) AyAz 

j = 0 k = 0 


/ / 
n— 1 n— 1 


+ ZT,(D b 2 G)(i,n,k)AxAz 

t— 0 jc = 0 


/ / 
n — 1 n- 1 


+ 53 51 (D 63 G)(i,;',n) AxAy 

t— 0 J = 0 


where denote a summation with a factor of 5- 


on the summand for the xxxx xxxx of 
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integration. Next apply Taylor’s theorem with remainder to (D.7) to get, for example, 

(D bl G)(n,j,k) = ff(-£ + V/) 

where the right side is evaluated at some point between ( n — 1 ,j,k) and (n,j,k). 
When summed over the boundary the V/ term drops out for n ~ oo, since r = 0(n) 
and r 2 V / ~ 0 by assumption. That is, the constant K is determined independently 
of / by summing the 0(1/ r 2 ) terms over the boundary. 

Next we show that / in (D.7) is uniquely determined by another application of 
the discrete divergence theorem. Suppose that / and f are two different functions 
satisfying (D.7) and let g = / — /'. If Djg = 0 everywhere, then g must be constant, 
and by the r 2 V/ ~ 0 condition this constant must be zero. Therefore we may 
assume, for example, that (D jig)(i* , j m , k*) 0 and define h(i,j,k ) = 0 if i < i" and 

h(i,j , k ) = Ax if i > i* + 1. Now apply (D.14) to (D b g)h to get 

E D f ■ [{D b g)h)AR = E h(D b g ) • nAs ~ 0 

dRP 

since r 2 D b g ~ 0. On the other hand, simple algebra gives the following product rule 
for discrete differentiation 


E D r [(D b g)h\AR = Y,( D2 9 )hAR + J2(Df 9 )-(Dfh)AR 

R° Ft 0 R° 

= o + (D fl g)(t*,j',k')AR^0. 

This contradiction proves that g = 0, hence the uniqueness of /. 

There is a condition equivalent to the traditional free space condition (D.7) that 
is more natural from a mathematical point of view. Simply assume that the Fourier 
series (D.10) converges in the L l norm. Then we demonstrate that the full series con- 
verges in L x to ft -1 , hence that (D.8) holds by Fourier series inversion. For notational 
convenience, the argument is illustrated in the y dimension, though at least three 
dimensions are required for correctness. First scale the Poisson equation by r^e-^ 7 , 
and sum over j, using (D.6),(D.10), G symmetry, and trig identities to get 

OO 

1 = E (D 2 G){j)r^e 20i 

j=-oo 


= 2(G(1) - G(0)) + 2 £(G(j - 1) - 2 G(j) + G(j + 1))H co sj0 


= 2 


; = 1 
oo 


G(l) - g{r,(3) + E G'(i) r ' +1 cos (j + !)/? + Yi G U) lJ ~ 1 cos U ~ l )P 


= ((r + r l )cos£ - 2) g(r,0) + (r 1 - r)(sin0)g(r,fi) + (r - r 1 )(cos^)G(0) 


where g(r,/3) = G(j)r^e^ 0 ‘ and g{r,@) = 2 G(j)r^ sin j/3. By assumption 

both g(r,y) and g(r, y) converge in L 1 to limits g(0) and g{/3) as r — ► l - , so g 
satisfies 1 = 2(cos(/?) — 1 )g{(3) = — 4 s'm 2 (f3/2)g((3) almost everywhere. 

It can also be demonstrated directly from corresponding Fourier integral results 
that all terms of the asymptotic expansion (D.2) have Fourier series, in the sense of 
(D.10), that are L 1 summable. 
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D.2.2 The Asymptotic Expansion 

There are two ways of deriving the asymptotic expansion. One way is to apply 
the appropriate integration techniques directly to the double integral (D.13) above. 
Another way is to assume the form of the expansion, plug this into the Poisson 
equation (D.6) and the recursion relations (D.ll), and use Taylor series. 

The first approach demonstrates the existence of an asymptotic expansion of the 
Fourier series definition (D.8). In particular, it establishes the free space condition 
(D.7) and the value of the scale factor. Then by the uniqueness argument in sec- 
tion D.2.1 above, the second approach is justified. The latter method is algebraically 
easier. 

First Approach 

In the first approach, the coordinates of the double integral (D.13) are trans- 
formed to polar coordinates 9,p. Asymptotic evaluation of (D.13) reduces to the 
computation of 

r 2>r rp' e -(» v<(p,8)-q(6)pi)r 

I(x,y,z) = I I : — — — - — . pdpdO (D.15) 

Jo Jo sinh(7(p, 9))/Az 

where r ~ oo, p" > 0 with p‘ 76 0, and q(0) = u cos (9) + v sin(0). That is, 
r n (G(i,j,k) + (AxAyAz /8n 2 )I(x,y, z)) ~ 0 for all finite n > 0. Note that w > 0 
was assumed in (D.13). 

To evaluate the inner integral, change the radial coordinate p to the complex 
coordinate 


v = i>(p, 9 ) = w ( £ — qpi 
dv = (w( p - qi)dp. 


(D.16) 


To get ( p , differentiate (D.12) and use the notation s z = sinh(7)/Az, c z = cosh(7), 
s x = sin(a)/Ai, c x = cos (a), s y = s\n((3)/A y, c y = cos ({3), c$ = cos(0), 
sg = sin(0), t(p,9) = s x cg + SySg, and D(p,9) = wt — s z qi. Then 

0 - M = £, CV) = (D.1T) 

Thus from (D.15) 

I = J J q e~' /T fe{v)dv dO for f e (v) = ^. (D.18) 


Now the asymptotic expansion is obtained by applying integration by parts re- 
peatedly to the inner integral of (D.18): 


'=t(r^> + >‘")?r +R 

n— 1 


(D.19) 


20 ! 


with remainder 


and / (n) (0 + ) = lim p » 0 + f (n \v(p)). It is not immediately clear that any terms of the 
expansion (D.19) are finite since it has not yet been shown that the limits f ( e n \ 0 + ) 
exist and are integrable in 0. In fact we will demonstrate that fg n \v) is uniformly 
bounded on [0,2x] x [0,^*] by showing how to compute the limits. Thus /?_ = 
0(l/r N + !). 

Let us begin by calculating the first asymptotic term to get the constant I\ in 
(D.7). Note that for p ~ 0 + 


(/ O '))" 1 


f $ z /■ 

w -qi ~ w(cj 4- si) qi ~ w — qi 

P P /> 


(D.20) 


from (D.18), (D.17), and (D.12). Next note that q{0) = ( u 2 + v 2 ) 1 ' 2 cos{6 - 0 o ) = 
(1 — w 2 ) 1 / 2 cos(0 — 0 O ) for 9 0 = tan ~ 1 (v/u). Therefore 


r fe(^)de 

Jo 



w + q(0)i 
w 2 4 q 2 (0) 

w 4 cos(0)(l — w 2 ) l / 2 i 
w 2 + (1 — u; 2 )cos 2 (#) 



*/2 dd 

cos 2 (0) + u> 2 sin 2 (0) 
*/ 2 sec 2 (0) dd 

1 4 w 2 tan 2 (0) 


(0.21) 


so from (D.13) 

K _ AxAj/Az 
4t 

Next consider the first derivative term. Use the notation W = w — qi, so that 
D/p ~ W for p ~ 0 + by (D.20). Then 


_ P' pD' _ p' - fD' 

J D D 2 D 

and by (D.17) 

p' ~ W~ l 

t' = (c r c 2 + c v s 2 ) p' ~ p' ~ W - 1 
C ~ (4 4 s 2 g)/(D/p) ~ W 1 

SO 

D' = wt' - c z ('qi ~ (w-qi)/W = 1 
p - fD' ~ - W~ l ~ 0 . 


(D.22) 


(D.23) 
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(D.24) 


Therefore a version of L’Hospital’s rule may be applied to evaluate (D.22): 


o" - fD" - f'D' 1 

rf r J ^ J U 1 / // r 

/ ~D> ° r * ~ 9 ^ - f D ) 


This version may be stated in the following way. If h(x)g(x) = f(x) for g and / 
continuous on [0, x*] with g( 0) = /( 0) = 0, and / and g continuously differentiable on 
(0, x*) with /'(x) = /i(x) + f 2 (x)h(x) and such that the limits g'(0 + ), /i(0 + ), /2(0 + ) 
exist and <7'(0 + ) — /2(0 + ) ^ 0, then /i(0 + ) exists and equals /i(0 + )/(<7'(0 + ) — /2(0 + )). 
The usual proof by the mean value theorem applies. 

Now by (D.23) 


D" = wt" — s" z qi 

t" = - (s x Ax 2 4 + s y Ay 2 sfj ( p ) 2 + (c x c^ + c y s 2 e ) p" ~ p" 

s" = -s z Az 2 (C) 2 + c z (" ~ C" (D.25) 

Using (D.17), (D.23), and (D.25) and applying L’Hospital’s rule again leads to simul- 
taneous equations for p" and 


p" ~ «- p’D" - p"D')jD' 

~ C - W~ x (wp" - C'qi) ~ p" 

(D.26) 

c" ~ (t" -CD" -CD') /D' 

~ p" - w- 1 ( w P " - C'qi) - C" 

Since these equations are homogeneous, they may be solved to get p 
Hence fg(0 + ) = 0 by (D.24) and (D.25). To solve for fg n \ 0 + ) when n > 2, simply 
generalize the method used for the first derivative term. Assume by induction that 
the limits exist at v = 0 for p^ +l \ and £U +1 ) when j = 0,...,n — 1. Then the 
jO+i) anc j £)(i-M) limits also exist, as in (D.25). To derive the formula for p^ n \ note 
that 

p (n) = (/D) (n) = £ ( ” ] / 0) Z) (n - J ) 

so the / (n) £> limit exists and 

f {n) D = 

for 

F n = 

Also assume by induction that the f^D limit is zero, already verified by (D.22) and 
(D.23) for n = 1. Then by L’Hospital’s rule 

/<">£>' ~ p( n+1 > - F' n - nf {n ~ l) D" - nf {n) D' 


J= o W / 


p( n) - F n - nf^-^D' 


E ( " ) 


(D.27) 
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so 


/>< n+1) -F n+1 -(n + i)/(">zr ~ 0, 

verifying the f^D induction hypothesis and the formula 

/<"> ~ -J-r ((- (n+11 - F n +i) (D.28) 

TX t" i. 


provided that the existence of the p^ n+1 ^ and ^ n+1 ) limits has been demonstrated. 

To calculate p( n+1 ^ and <d n+1 \ just rnimic the argument used in (D.27) - (D.28), 
except that you end up with a pair of simultaneous equations as in (D.26). First 
expand s[ n ^ = ( p'D )W and and assume by induction that the p^ n+1 ^D 

and (( n+1 )£) limits are zero. Then generalize the version of L’Hospital’s rule cited 
above to the case of two simultaneous limits to get 


4 n+1) -/£> (n+1) -FWi-(n+l)p (n+1) ~ 0 

*("+!) _ £'£>( n +l) _ Q n+l _ („ + ^(n+1) ^ 0 


(D.29) 


where 

p„+, = E ( n t 1 ) , <?„ +I = g ( n + 1 ) c 


Evaluating T)( n+1 ) ~ , s^ n+1 ) ~ S n+ i + C^ n+1 \ and <( n+1) ~ T n+ \ + 

p( n+1 ) separates out all the remaining p( n+1 ) and (d n+1 ) terms. Now solve (D.29) by 
collecting these terms: 

- (" + 1 + w) ' ,<n+l> + (‘ + w) c< ” +11 ~ p + w T ■« - (» + w) 5 “« 

(>-£)/-‘" +1 > -(« + !-£) d-' ~ ^ +l -(i-^)r„ +1 -^ +I 

or 

P (n+l) ~ [(n + l)w(S n+1 -T n+1 )-wQ n+l -((n + l)W-qi)P n+l ]/DD 

(D.30) 

C (n+1) ~ [(n + l)qi(S n+1 -T n+1 ) + qiP n+1 -((n+l)W + w)Q n+1 ]/DD 


for DD = (n + l)(n + 2)W. 

But W(0) is uniformly bounded away from zero by the assumption that w > 0. 
Therefore the use of the simultaneous limit version of L’Hospital’s rule is valid for all 
0 and all the induction hypotheses are verified. Also the denominator of fg n \ 0 + ) is 
a power of W, so fjj n \v) must be uniformly bounded on [0,2tt] x [0, v*]. This fully 
establishes the both the finiteness and computability of the terms of the expansion 
(D.19) and the 0(l/r N+1 ) property of its remainder. A simple consequence is the 
free space condition (D.7). However the method used in the next section provides a 
simpler way to derive convenient formulas for the higher order terms. 
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Second Approach 

In the second approach it is assumed that the expansion has the form 


!+-) 


(D.31) 


where r) n = r) n {u, Ax, Ay, Az), u = (u,u,u») = (x/r, y/r, z/r), and (x,y,z) - 
(z'Ax, jAy, JfcAz). G is assumed to be a smooth function of x,y,z and each rj n a 
smooth function of u, v, w. 

The expansion (D.31) is substituted into the Poisson equation (D.6) and recursion 
relations (D.ll). But first all terms, except the G(i,j, k) terms, are expanded in 
Taylor series about (x, y,z). For example, 


Ax n 


G(i + l,j,k) = G(i,j,k) + D x G{i,j,k) — 


n= 1 


(D.32) 


where (D x ,D y ,D z ) = V is the gradient operator. For each equation all terms that 
have the same" power of 1 /r are collected and set to zero. For this purpose, it is easily 

demonstrated by induction that D^(T]t/r ) = 0( 1/r ). , 

Certain auxiliary equations are also necessary. This includes the spherical equation 

u 2 + v 2 + w 2 = 1 (D.33) 


and orthogonality relations of the form F r = VF • u = 0 for F(x, y, z ) = F(ur, vr, wr) 
a function whose value is independent of the scale factor r for r ^ 0. In mathematical 
terminology, F is a function on the real projective plane. For example, 

(4 = Vr, n -u = 0. (D-34) 


The l/r m+l equations, m > 1, that come from the recursion relations are 

»*.(£) = •*>■(£)+•£ 
where q m = (< 7 mii <7m2i Qmz) is defined by 

q„ = r" E (2( S ^ tor k m = integerpartof . 




(D.35) 


(D.36) 

(D.37) 


= D»{2t + 1) (2=E|) , D\j)= (Ax‘Di,Ay‘Di,&z l Di). 

Equations (D.34) - (D.35) evaluate to a 3 by 3 system of equations in Vrj m , which is 
solved to get 

Vrj m = uu‘q m - q m . ( D Jh ) 
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Th us Vrh _ 0, so t]i is constant. According to section D.2.2 this constant value is 1 
the l/r m+ equation that comes from the Poisson equation is 


ly2 f V?£\ _i_ 'f' Y . ' Sm< 

2 ' rm ' £t(2^ + 2)! 


= 0. 


(D..39) 


Using (D.33) and (D.34) to simplify (D.39) yields 


T * / V 7 _ 

Vm = 7 -r— v 2 7; m + 2r m V J—iinL 

^(2£ + 2)l 

Now (D.38), (D.36), and (D.39) yield 


m 


V'V = ^ ( V, m . u) + r" £ g> u ‘ s -<) - V • s„ ( 


I 

*=1 


„m 

= r 


( 2 £ + 1)1 

\p • s m< 4- u^Ysyn/u -f 2u • s m //r 

^ (2^+1)! 


_ _ r m-i r V • s m< + (m — l)u ■ s m ^ 


E 

e=\ 


( 2 £ + 1 )! 


(D.40) 


(D.41) 


using the orthogonality relation V(r m+1 s m ,) ■ u = 0 to get Vs m ,u = -(m + 1 )s t /r 

Thus substituting (D.41) into (D.40) gives the following recursion formula for the 
asymptotic coefficients when m > 2. 


Vm = Y2 


^TTi+l 


U + 


£r 


(2£ + l)\m \ ^(*+l)( m -l) 




(D.42) 


It is immediate from (D.42) that % = 0 . Also s m , involves only lower even 
asymptotic coefficients rj n if m is even, so (D.42) gives a proof by induction that 
— 0 for all even m. In addition, it is shown below that 7 2 *+i may always be 
expressed as a polynomial in factors of the form U 2i (2j), as in (D.43) and (D.44): 

V3 = ^U 2 (0) - jU 2 ( 2) + ^U 2 { 4) (D.43) 

Vs = ^(0) - ||r/ 4 (2) + l|^£/-(4) - ^<(6) 

+ (^’(2) - f^(4)) U\ 2) + ^l/=( 4)^(4) (D.44) 

where U'(j ) = Ai'ti J + A yV + Ax'u; J . These formulas for r/ 3 and 95 are good for 
computation as well as compactness of expression. 

First, it is clear by induction from (D.42) that 7 hk+1 is always a homogeneous 
polynomial in Ax, A y, Az of degree 2k, that the coefficients of this polynomial are 
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themselves polynomials in it, v, w, and that all powers of u, v, w, Ax, Ay, A z are even. 
The way to get the special polynomial form used in (D.43) and (D.44) is as follows. 
The basic idea is to first compute the x partial derivatives of u 3 /r' in the form 


D 


k 

X 




k ]-k+7n 

E r i+j+2n 

n=0 


(D.45) 


with cij-(n) = 0 if j-k + 2n < 0. The same coefficients are valid for the corresponding 
expansions of the y partials of v 3 fr' and the 2 partials of w 3 jr x . Next extend (D.45) 
to 


d: 



Ax m x 3 ~ k+2n 


(Ay m y 3 + Az m z 3 )x~ k+2n ' 


^ u , x L±X X J ' k / \ 

S a j,»( n ) 17+7+2 n ^ a 0,i+j( n ) r t+j + 2n 

n= 0 L 

X — ' ( k r \ k \ Ax Uj— k + 2n jjm(. ^—k+2n 

E — -ht — + f/ 0)-p+r- 


n=0 L 


This formula may be extended further to compute the coefficient iftjt+i by the following 
method. 


r 2k+3 V ■ D 2t (2£ + l) 


\ u 2k ~ 2t 0)1 

r 2k + \-2t 


£ [(<■&» - <i») 


(D.46) 


where m e = 2£ + 2, m n = 2n - 2£ - 2, and m k = 2k + 1 - 2£ + j. Exactly the same 
formula is used to compute r 2 * +2 u • D 2t { 2£ +1) [ U 2k 2t (j)/i‘ 2k+1 2t ■> except that rri( 
is changed to 2£ + 1 and m n to 2 n - 2£. Thus the tj 5 formula (D.44) is obtained from 
the 773 formula (D.43) by using (D.46) to evaluate the recursion formula (D.42). 

To get rj 7 one must in addition evaluate the same operators on terms of the form 
U n {jx)U i3 {j 2 )lr 2kJrX ~ 2t where ii + i 2 = 2 k - 2£. The result is 



Jl +J 2 >™fc 


n=0 


(n) - <%.(») - +], + m„) 

+ (Crn.M - «&,(")) C‘ +2, (ji + m„)UHh) 

+ (<■£-» - <C.») C i,+I U + ™«W'(h) 

+ o (047) 


with m k , m(, and m n exactly as before. 

Formula (2.41) generalizes to U\j) products of arbitrary length by using the input- 
output formula of combinatorial set theory. 


D.2.3 The Three Plane Representation 

As described in section D. 1 . 1 , the Green’s function G is computed from its asymptotic 
values by solving a Neumann boundary value problem with a delta source function. 
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The theory of FFT solutions to Neumann problems is given in section D.3.1. Here 
that theory is applied to get a simple formula, (D.51) below, for the DFT, or discrete 
Fourier transform, G, defined by (D.90), in terms of the boundary data. 

The theory begins by transforming the boundary data into equivalent boundary 
sources, as in section D.3.1. Thus the sources p are all zero except for p(0,0,0) = 1, 
p(m x ,j,k) = g x (j,k), p(i,m y ,k) = g y (i,k), and p{i,j,m z ) = g z (i,j), where 

g,(j,k) = - G(m, + l' J ’ fc > A ~ 2 G(m, -l' j ' t > (D.48) 

and g y and g z are defined similarly. 

Next, these sources are cosine transformed in x, y, and z. To get the DFT G, 
this triple cosine transform is scaled by a factor of 8. To illustrate, the scaled triple 
transform of g x is ( — l) a G x (/3, 7) where 


7) = 4 ^2 cos f ) COS f 2^) g x (j, k) (D.49) 

:=ok=o \ m y / V / 

and the ' denotes a scale factor of 1/2 on the initial and final summands. 

Finally, the triple cosine transform is divided by the sum of the three eigenvectors, 
as in section D.3.1. Here 



and e y and e z are defined similarly. Thus 


(D.50) 


d<«, a 7) = L t MdL p. + t&gi ± + (D.5i) 

e*(a) + e y (/3) + e^) 
for (a,/3, 7) ^ (0,0,0), s a = (-1)“, etc. 

G(0,0,0) requires a special computation since the denominator of (D.51) is zero. 
But first note that the numerator must also be zero, which is the consistency condition: 


G x (0,0) + G„(0,0) + G Z (0,0) = -1. (D.52) 

Formula (D.52) is an excellent check of numerical accuracy. The code automatically 
scales the left side of (D.52) by a factor, labeled RSC, which forces (D.52) to be 
satisfied to machine accuracy. Then the deviation of RSC from 1 is used as a measure 
of the number of significant digits in the Green’s function. 

To compute G(0,0,0), first compute G(m x ,m y ,m z ) to maximum accuracy by the 
asymptotic expansion (D.l). Then represent G(m x ,m y ,m z ) by the inverse DFT for- 
mula 


G ( T7l x , TTly, TTlj) 


1 


m x m y m z 


m' x rriy m f g 

ZEE cos an cos 0n cos 'yn G(a, /?, 7) 

Or = 0 P~0 7=0 


1 

m x m y m z 


m‘ x rn f y m x 

EEE(-1 


a=0/3=0-r=0 


(D.53) 
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Solving (D.53) for 6(0, 0,0) gives 

6(0, 0,0) = 8 (m x m y m z G(m x , m y , m z ) - ^ (-l) a+ ^ + ' r 6(a, /?,7) j (D.54) 

\ ) 

where the sum extends over all (or, /?, 7), except (0,0,0), as in (D.53). 


D.2.4 The Downstream Green’s Function 

The downstream Green’s function has been defined by the summation (D.4), but the 
actual value of this summation is always infinite, as is shown below. However it is 
also shown below that definition (D.4) is mathematically valid in the following sense. 
Interpret G d as a linear operator on the set V Q of all downstream source functions 
that sum to zero: 

G d (i,-, •) * t = v)* T (D.55) 

l—TTlx 

for 

{ m y I 

/: ££/(i,j) = 0>. (D.56) 

j = o k=o ) 

If G is already known on the computational box, then only the summations outside 
the box need be investigated in order to compute Gj. Outside the box the asymptotic 
expansion (D.2) may be used to compute G. The Euler- Maclaurin formula permits 
the infinite sum of each asymptotic term to be computed by the corresponding infinite 
integral plus correction terms derived from the derivatives of the summand function 
at the lower boundary (downstream edge of the box). That is, write 

m x 

G d (i,j,k) = G d (-lJ,k)+ £ G(f,i,fc), (D.57) 


and compute 


G d {-l,j,k ) 


OO 

= 

/=i 

= -h / G(x, j, k)dx - ~G(xq, j , k) - — D x G(x 0 , j, k ) 

Ax j XQ ^ ^ 

(D.58) 




2 »— 1 n2«-i 


D\' l G{x 0 ,j,k) 


where x Q = m x Ax and B 2i is a Bernoulli number. 

Applying (D.58) to the expansion (D.2) shows that the infinite part comes from 

the integral 


[ N — — . = b \N + JF 2 + V 2 + H “ ln ( x o + r) 

J * 0 \(x,y,z)\ L J 


= ln(2 N) - ln(x 0 + r) + 0{l/N 2 ) 


(D.59) 
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for N ~ oo and r = \(x 0 ,y,z)\. The infinite term ln(x 0 4- N) is constant for all 
j, k, and a constant vanishes upon convolution with a function in Vq. Therefore the 
operator formula (D.55) is well defined. 

The simplest way to make G d well defined as a function in the context of the 
Euler-Maclaurin expansion is to delete the infinite term from the integration (D.59). 
However, this is not quite enough because the goal is to compute G d in the same way 
as G is computed: First compute G d values at the boundary of the computational 
box, then solve a Neumann boundary value problem. For this purpose 0^(0, 0,0) 
requires a separate computation, just as 0(0, 0,0) did in section D.2.3. But in this 
case G d is actually only defined up to an arbitrary constant, according to (D.55) and 
(D.56). That is, the value of 0<*(O, 0, 0) may be chosen arbitrarily. The simplest choice 
is <^(0,0,0) = 0, and this definition also has the advantage that it tends to minimize 
the effects of any numerical deviation from the downstream condition (D.56). 

Another aspect of the Neumann boundary value approach to G d is that four bound- 
ary planes of data, instead of three, are required due to the lack of symmetry along 
the x axis. The equivalent sources on the i = 0 and i = m x planes are 


9do(j, k ) 


9dx(j, k ) 


Gd(-l,j,k) - G d (+l,j,k) 

Ax 2 

Gjjrix — 1, j, k) + GjjrixJ, k) 

Ax 2 


+ * 0 ', k) 

+ 6(j, k) 


Gd{m x T 1, J, k ) Gd(m x 1 , J 1 ', k ) 

/\ t ^ 

G(0,j t k) + G(lJ,k) 

Ax 2 


From (D.57) the boundary plane at j = m y is 


(D.60) 


(D.61) 


9dy(i, k ) 


Gj(f,m v + MQ - Gdi^rriy - l,fc) 

Ay 2 

Gj( — 1, m v + 1, k) — <jj(— 1, m v — 1, k) 

Ay 2 

m x 

+ £ «,(<,*) 


(D.62) 


and similarly for gdz(hj)- A scaled triple cosine transform is applied to each of planes 
(D.60) - (D.62), just as in section D.2.3, to get the four planes G d o , G dX i G dy , G dz - 
The result is 


Gd(a, (3, 7 ) 


GdojP, 7) + SgGdxj^ 7) + spGdyja , 7 ) + s^(?^(q, /?) 
e*(«) + e y (0) + e z (~f) 


(D.63) 


for (a, 0, 7) ^ (0, 0, 0) , G d ( 0 , 0 , 0) = 0 , s a = (-l) a , etc. 

If G has been computed to an error of 0(l/r 7 ) by the expansion (D.2), then the 
infinite summation (D.4) gives a G d error of 0(l/r 6 ) in the following way. The 1/r 
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term in the expansion of G contributes 


1 Axu Ax 3 (9 u — 15u 3 ) 
P{l/r) = - ln(x 0 + r) - — + ■ 


(D.64) 


where u = x 0 /r . According to (D.43) the r? 3 /r 3 contribution requires three Euler- 
M aclaurin calculations: 

* <»*/'■> - hr, - $ - £ ( 2t “ 2i “ - < 3 + <d m) 

for i = 0, 1,2. By (D.44) the r) 5 /r 5 contribution requires five calculations: 




„2t 


,2 i 


00 X U* 

Ax Jx o r 5+2 ‘ 2r 5 


(D.66) 


for i = 0, 1,2, 3, 4. 

Using integration by parts, each of the integrals in (D.65) may be reduced to 

[°° = 1__ (D.67) 

Jx o r 3 1 + u r 2 ’ 


r 

J XQ 


(D.68) 


Jx 0 r" i + 

and each of the integrals in (D.66) may be reduced to 

w di u + 2 1 

= 3(u + 1) 2 K 

Thus for (x 0 , y, z) = {m x AxJAy,kAz), V'(j) = AyV + Az'V, and U'(0) = 
Ax* + Ay* + Az', compute 

AxAyAz J p(1/r) + Qu 2 (0) - jV 2 (2) + ^ 2 (4)) P(l/r 3 ) 


Gd(~ 1, j, k) — 


- jAx 2 P( U J /r 3 ) + ^Ax’/V/r 3 ) + (A. M I0) - ||V(2) + i|V(4) 

_ I®V( 6 ) + - 1^(2) + gv 3 (4>) U\ 0) 

(4)) V 3 (2) + P(l/r 5 ) 


16 


+ (f^,2,-f, 3 


+ (-^|ax 2 - ||c/ 2 (0) + ^V l (2) - fv 3 (4)) A x 3 /V/r s ) 
+ (|C 2 (0) + ^Ax 3 - ^V»(2) + ^V 3 (4)) Ax’/V/r*) 


+ ^Ax*P( u «/r s ) 


(D.69) 


D.3 THEORY OF THE CONVOLUTION AL- 
GORITHM 

D.3.1 FFT Dirichlet and Neumann Poisson Solvers 

FFT Poisson solver methods are described for a rectangular domain (box) in a Carte- 
sian grid of arbitrary dimension. At each face the boundary conditions may be chosen 
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independently to be Dirichlet or Neumann. Neumann boundary conditions are of the 
central difference kind, centered on a face. 

Additional boundary options include periodic boundary conditions and also Dirich- 
let or Neumann boundary conditions of the simple forward or backward difference 
kind, centered on a half grid line just inside a face. 

The methods employ FFT sine and cosine transforms and shifted sine and cosine 
transforms. Optionally, tridiagonal solvers, which are faster, may replace the “last” 
transform. 

These transforms are derived as eigenvectors of the matrices that describe the 
problems. For example, a ID Poisson problem D 2 <f> = p may be written as a matrix 
equation Ax = b and solved by computing x = E(A~ 1 (E~ 1 b)), where E is the eigen- 
vector matrix and A is the diagonal eigenvalue matrix. In general such a procedure 
is extremely inefficient, but the availability of analytical formulas for A and E and of 
FFT techniques for the matrix-vector multiplies makes the method very attractive. 
Swarztrauber(94] also presents much of the following material. 

Derivation of the Matrix Equations 

In a ID Dirichlet problem D 2 <f> = p the endpoint values <£(0) and <f>(m ) are specified. 
Thus the problem may be rewritten as the matrix equation 

Ax = b (D.70) 

where A and b are given by 

— 2xi + x 2 = Ax 2 /j(1) — ^(0) 

xj-i-2 Xj + Xj +l = A x 2 p(j) (D.71) 

x m -2 — 2x m _i = A x 2 p(m — 1) — </>(m). 

Next let D 2 4> = p represent a central difference Neumann problem with endpoint 
specifications d b = (<£(-1) - <£(l))/2Ax and d e = ( 4>(m + 1) - <j>{m - l))/2Ax. Then 
the matrix equation is given by 

-2x 0 + 2xj = Ax 2 />(0) - 2A xd b 

Xj — i - 2xj + x j+1 = Ax^j) (D.72) 

2x m _i — 2x m = A x 2 p(m) — 2A xd e . 

A Dirichlet condition on the left with a Neumann condition on the right gives 
— 2x x + x 2 = AxV(l) — ^(0) 

Xj-i — 2xj + x J+1 = Ax 2 p(,;) (D.73) 

2x m — j 2x m ~ Ax 2 p(m) — 2Ar(fg. 
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A Dirichlet condition on the right with a Neumann on the left gives 
— 2x 0 + 2xx = Ax 2 p(0) — 2Ax<4 

xj-i - 2xj + x j+ i = Ax 2 p(j) (D.74) 

x m _ 2 - 2x m _! = Ax 2 p(m - 1) - 4>(m). 

For the periodic boundary condition <f>(m + j) = <f>(j), the result is 
-2x 0 + £i + Im-l = Ax 2 />(0) 

Xj_i — 2 Xj •+■ Xj+i = A x 2 p(j) (D.75) 

x 0 +x m _ 2 -2x m _ 1 = Ax 2 p(m — 1). 

For Dirichlet half grid line boundary conditions, at, = (<?!>(0) + <f>( l))/2 and a e = 
(4>(m — 1) + <f>(m))/2 are specified. This gives 


— 3xj -(- X 2 


= Ax 2 p(l) - 2 a h 


Xj- x - 2 Xj + Xj+i = A x 2 p(j) 

x m _ 2 -3x m _i = Ax 2 p(m - 1) - 2a e . 


(D.76) 


For Neumann half grid line boundary conditions, <4 = (<^(0) — 0(1))/ Ax and 
d e = (0(m) — 0(m — 1))/Ax are specified. This gives 


-Xi + x 2 


= Ax 2 p(0) — A xdb 


Xj— 1 2 Xj T Xj+j — Ax p(j) 


^ m — 2 m — 1 


= A x 2 p(m — 1) — A xd e . 


(D.77) 


Of course, mixed half grid line Dirichlet-Neumann problems are also possible by 
starting out as in (D.76) and ending as in (D.77), or vice versa. 

If the Poisson problem is a 2D problem, let A x be the ID matrix for the x direction 
and A y be the ID matrix for the y direction. Let I x and I y be the corresponding 
identity matrices, and let ® denote the tensor product. Then the matrix equation 
becomes 


/ A X ® Iy I X ® Ay \ 

(^^ + ^H X = b 


(D.78) 


(The tensor product of two matrices A nxm and B oxp is a linear operator on matrices 
of order m x p that gives n x o matrices. It may be defined by 


or, equivalently, 


(A ® B)x = ^ for z kj = ^2 AkiXij. 

That is, first operate on the rows by B , then on the columns of the result by A , or 
reverse the order to operate first on the columns by A , then on the rows by B.) If, for 
example, A x is Neumann on the left and Dirichlet on the right with A y full Dirichlet, 
the b in (D.78) is given by the computer operations 


bij 

: = P(hj) 

for 0 < i < m a 

b oj 

1 

O 

-c 

ii 

2 Mi) 

Ax 

for 

0 < j < m y 

bm x -l,j 

:= 6 mi _ 

4>{m x ,j ) 
1,3 Ax 2 

for 

0 < j < m y 

6,1 

6*i — 

0 ) 

for 

0 < i < m x 


Ay 2 



6,,m v — 1 

• — bi,m y 

</»(*, m y ) 
Ay 2 

for 

0 < i < m x . 


(D.79) 


Both (D.78) and (D.79) generalize directly to higher dimensions, with the 3D equation 
being 

A x ® 7y ® I z f I x <g> Ay ® I z i I x ®Iy® A z 


Ax 2 


+ 


At / 2 


+ 


Az 2 


x = b. 


(D.80) 


Eigenvectors and Eigenvalues 

The eigenvector-eigenvalue solution to equation (D.70) is simply 
x = A _1 b = (EAE~ 1 )~ 1 b = EiA^iE^b)) 


(D.81) 


where E = (ejk) is the eigenvector matrix of A and A = (A*) is the diagonal eigenvalue 
matrix. This generalizes directly to the multi-dimensional tensor product formulation 
as follows. Applying the tensor product properties 

(A ® B)[(C + C') ® (D + £>')] = (AC + AC')®(BD + BD') 

[(A + A') ® (B + B')](C ® D) = (AC + A'C) ® (BD + B'D) 


to (D.78), for example, gives 


X = (E x <g> Ey) 

The corresponding 3D solution is 

X = (E x ® Ey ® E z )y 


A ^ + ^ k )"( £ * , ® i: .") b - < D ' 82 > 


(D.83) 
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where 


-l 


y = 


A x <g> Iy ® I z , I x ® Ay ® I X <8) Iy® A z 


Ax 2 


+ 


Ay 2 


+ 


Az 2 




and 


b xyz = 

Another way to write (D.83) is 

(^pr + X -0- + ^r) **"(“. A 7) = b">,0,7). (0-84) 


The three plane representation formula (D.37) for the Green’s function is derived 
from (D.84). 

Now the task is to find the eigenvalues and eigenvectors for the matrices (D.71) - 
(D.77). These are given in table D.l. 

There are certain useful relations between some of the eigenvector matrices of 
table D.l. Let S be the operator that reverses the order of the rows when multiplied 
on the left, and let T negate the even numbered columns when multiplied on the 


right. Then 






E Dh D H (rn) 

= 

3 

1 

h-* 



E N h N h {” 0 

= 

Etfoi 171 ~ 1) 



Ep{m) 

= 

Enn (m/2) + E DD (m/2)i 



Edn 

= 

S End T 

(D.85) 


E D h D h 

— 

T E NhNh S 



Ep h N h 

— 

SE N ,D h T. 



Table D.l may be verified by substituting into the appropriate matrix equation, 
(D.71) - (D.77), and using trigonometric identities. For example, in the half grid line 
Neumann case (D.77) let n = m — 1 and 0 = kirj2n. Then 


(Aejt) 1 = — cos 0 + cos ZB 

= — cos 0 + cos 0 cos 20 — sin 0 sin 20 

= cos 0[— 1 + (1 — 2 sin 2 0) — 2 sin 2 0] 

= — 4 sin 2 0 cos 0 

{Aek)j = cos(2 j — 3)0 — 2 cos(2j — 1)0 + cos(2j + 1)0 
= (cos 20 — 2 + cos 20) cos(2 j — 1)0 
= — 4sin 2 (0) cos(2j — 1)0 

(Aefc) n = cos(2n — 3)0 — cos(2n — 1)0 
= ( — l) fc (cos 30 — cos 0) 

= — 4sin 2 0cos(2n — 1)0 


There is one case in which the solution formula (D.83) fails. Namely, a zero sum-of- 
eigen values cannot be inverted. According to table D.l this occurs for a = /3 = 7 = 0 
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Problem* Eigenvectors E Eigenvalues A 

DD 


NN 


DN 


DkDh 


N h N h 


D h N) 


sin 


cos 



— 4 sin 2 (j 2 -) i 

\ m J 

V 2m / 


(SO -^(t) 


sin 


i( fc -T)” 


- 4 -" 2 (^) 


ND cos(i^) 


“P{^} 


sin 


cos 


m— 1 


(j-^)kir 
m— 1 


k^h sin 


m— 1 


-' s ‘n' (infer) 


- 4si " 2 (2jfey) 

-4- 2 ((^) 


cos ((iifcill) -4sin J (^) 


Indices 

1 < j < m — 1 
1 < k < m — 1 

0 < j < m 

0 < fc < m 

f 1 < i < m 

1 < A: < m 

f 0 < j< m — 1 

1 < < m 

f 0 < i < m — 1 

0 < A: < m — 1 

f 1 < j < m — 1 

1 < k < m — 1 
f 1 < J< m — 1 

v 0 < k < m - 2 

f 1 < j < m — 1 

1 < k < m — 1 
f 1 < j < m — 1 

1 < k < m — 1 


D = standard Dirichle* N = central difference Neumann 

Dh, = half grid line Dirichlet Nh = half grid line Neumann 
P = periodic 


Table D.l: Eigenvectors and Eigenvalues 


Key 
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when every face has a central difference Neumann boundary condition. In this case the 
solution X is determined only up to an additive constant, and an additional equation 
is needed to determine this constant. In addition (D.84) shows that the consistency 
condition 

b xy2 (0, 0, 0) = 0 (D.86) 

must hold. Again, the three plane Green’s function formula (D.37) is an instance. 


The Transforms 


To implement FFT algorithms for the eigenvector matrices of table D.l, several mod- 
ifications are useful. First note that if F is any diagonal matrix, then A — EAE~ X = 
(EF)A(EF)~ l . That is, the eigenvectors may be scaled anyway you please. For ex- 
ample, if Fkk = 1 except F u — F mm = then the cosine transform E of table D.l 
may be replaced by the standard cosine transform EF. 

But note that in (D.81) - (D.83) it is the inverse of an eigenvector matrix that is 
first applied to the data vector b. Therefore the scale factors F are chosen so that 
( EF)~ X is a standard forward transform. These transforms and their inverses are 
listed in table D.2. 

The Tqn and T^p transforms, or their inverses T D h D h and are referred to 

as the shifted sine and shifted cosine transforms, respectively. The Tp h N h and T 1 v h p h 
transforms are the dual shifted sine and dual shifted cosine transforms. 

A straightforward way to derive the eigenvector matrix inverses of table D.2 is to 
compute S = (sfc) for s*. = | [ 2 and ejt = the k th eigenvector. Then E~ l = S~ 1 E t . 
Another way is to reduce the problem to standard DFT inversion by using symmetric 
or antisymmetric data. For example, to invert T^ hi ^ h set i> 2 n+i-j = bj for j = 1 ...n 
since cos(A:(2n -f 1 — j — j )iri/n ) = cos (k(j — |)7 ri/n). Also make the data periodic 
of period 2 n so that bo = b^n- Then 


h = 


(In.n, b) (*) = £>s (klj - b, 

1 2 n — 1 /■ 7T i 1 

-exp{-fc7ri/2n} 'fT expjfcj— 16_,. 


Now invert the DFT and use 6^ = b 2n ^k and b n = 0 to get 


(D.87) 




1 * r 7r i i * 

= — exp \ -jk— [ 2exp{^7ri/2n}6 fc 

2 n t^o 1 n J 

^ + |cos(o-i)^)iJ = |(r NO b)u). 


2 

n 
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Forward Transform 


Inverse Transform 


(r DD b) (k) = E”-,' sin (to) 

(Tjvwb) (k)=f+ E”V cos (to) + (_l)‘Y 

(?Wb)(i) = ESTi'si" (m^) 4i + (-1)‘-'V 

(T wo b) (i) - ^ + E,”V cos (!i^) h i 


(r P b)(* : ) = E"-„ 1 exp(2to!)6 J 


7^-1 2 rp 

1 DD — m I DD 

T 1 - 1 _ 2 T 

^VTV - 

?£W — ^T DhDh (m + 1) 
+ 1) 

r- = £t> 


(T D „D»b) (*) = E“-,‘ sin (i^) bj 
(T NhNh b) (*) = E7=V cos (il^) 6, 
(I'D.w.b) (k) = E"-, 1 Sin (eztelit) j. 
(r«i,D»b) (*) = E"-, 1 COS ( !* . -j)b-i>- ) tj 


= ^3TT DA r(m - 1) 
r v h V h = ^=x T ND{m - 1) 

T 1 —! 2 7" 1 


7^ — 1 2 t 7 


Key* 

D = standard Dirichlet iV = central difference Neumann 

.D/i = half grid line Dirichlet N h = half grid line Neumann 
P = periodic 


Table D.2: Transforms and Their Inverses 
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Tridiagonal Solvers 

Instead of doing three transforms in (D. 83 ), forwards then backwards, it is computa- 
tionally more efficient to do two forward transformations, a tridiagonal solver, then 
two inverse transformations. For example, if the eigenvector decomposition is used 
only in the x and 2 directions, then the following equation is obtained instead. 

x = (E x ® I y <g> E z ) y (D.88) 

where 

( ® Iy ® I z lx® Ay ® 1 z lx ® Iy ® A z \ b** 

^ \ Ax 2 Ay 2 A z 2 J 

and 

b« = (e~ x ®i y ® e; 1 ) b. 

Formula (D.88) is equivalent to solving the following tridiagonal equation for every a 
and 7. 

A, + /, x"(a,., 7 ) = b"(a,., 7 ) (D.89) 

The n-dimensional generalization of (D.S8) and (D. 89 ) is obtained by specifying an 
order for the n transforms in the generalization of (D. 83 ), then simply omitting the 
last one. 

A general purpose tridiagonal solver could be used, but the (almost) equal and 
constant sub and super diagonals and the (almost) constant main diagonal permit 
a more efficient implementation. The code uses a modification of the LINPACK 
symmetric matrix algorithm, with vectorization in the x direction, unrolled loops in 
y, and an outer loop in 2. This way the FFT in x vectorizes over yz planes and the 
FFT in 2 vectorizes over xy planes for the ordering in an xyz FORTRAN array. 

The idea behind the algorithm is to do Gaussian elimination from the top and bot- 
tom simultaneously until the process meets in the middle. Then do back substitutions 
from the middle back to the top and bottom simultaneously. If the number of rows 
is even, there is a middle 2 by 2 block, which is solved directly. For an odd number 
of rows, both off diagonal values of the center row are eliminated simultaneously to 
get the middle value. 

To minimize the number of operations, especially the number of divisions, the 
following technique is used for a problem with constant diagonal and off diagonal. 
Define a = main diagonal value, c = off diagonal value, a.j = jth row diagonal value 
after the j — 1st Gaussian step, b 3 = jth row right hand side , dj = c/aj, do = a/c, 
and Ci = 1/c. Only the values d 0 , cj, and dj are actually computed and stored. 
Then in the forward substitution 


ax = a and a,+i = a 
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bj+i 

b j - M 

dj bj , 

b n-j 

■ = 6 n _j 

i 

+ 

1 


In the back substitution 

b j '■= ( b j - cb j+ 1 ) /«> = d j ( ci bj - b J+ j ) 
b n—j ■ dj(cib n _j b n _ 1 _j). 

In the case of a mixed Dirichlet-Neumann problem the diagonal values aj, hence 
dj, are different for the top and bottom eliminations. Also the first step of the forward 
substitution and the last step of the backward substitution must be specially coded. 

D.3.2 Transform Algorithms 

Several common transform algorithms are used in the code and also several unusual 
or specialized algorithms. All the algorithms consist of reductions to the standard 
complex FFT. In most cases there is an intermediate reduction to real FFT’s. The 
simplest and most efficient method of coding real FFT’s was to do a complex FFT on 
a pair of real sequences, although a single-sequence real transform code would have 
been simpler to use. Therefore most of the algorithms are actually applied to a pair 
of sequences. All the algorithms are implemented for a matrix-type data structure so 
that the transforms are performed in one direction with vectorization, or processing 
in parallel, in the other direction. 

First, let’s review the standard real, sine, and cosine algorithms. A basic reference 
is [95]. Let 6 0 , ..., 6 n _ x and Cq, ..., c n _! be a pair of real sequences and set a = b -f ic. 
Let the DFT (Discrete Fourier Transform) be defined by 


d(k) = DFT(a )(*) = £ exp { kj — } a r (D.90) 

j=o ' n ' 

Then the paired real algorithm is 

b (k) = - [Re(a(fc)) + Re(a(n - fc))] + ^ [Im(a(fc)) - Im(a(n - k))] 

(D.91) 

c{k) = I [Im(a(fc)) + Im(a(n -&))] + ^ [— Re(a(fr)) + Re(a(n - *))] 

for k = l,...,n/2, with 6(0) = Re(a(0)), and c(0) = lm(a(0)). Implicitly 6(n — k) = 
b(k) and c(n — k) = c(k). 

The sine transform algorithm is taken from Temperton [96]. The idea is to form 
a certain real sequence, apply a real transform, then extract the sine transform from 
the resulting data. The real sequence is 

dj = sin (;^) ( b j + K-j ) + I {bj ~ b n -:) 
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for 1 < j < n — 1 with do = 0 . Let 6 = Tdd b. Then 

6 ( 1 ) = l -Re(d(0)) 

b(2k + 1)- 6(2* - 1) = R e(d(k)) (0.92) 

6 ( 2 /:) = Im(d(k)) 


for k = 1 , n/2. 

The cosine transform is a close variation on (D.92). Let 

c 3 - sin {bj - 6 „__,) + - (b 3 + 6 „_j) 

for 0 < j < n — 1. Let 6 = Xjv/vb. Then 

6(1) = i (60 - 6 n ) + co 8 (j'x/n) 6 j (D.93) 

J=1 

6(2* +1)- 6(2* -1) = Im(c(fc)) 

6(2/:) = Re(c(/:)) 


for fc = 1 , n/ 2 . 

The three transform algorithms (D.91) - (D.93) may be combined to transform a 
single, real, zero-extended sequence. This is a sequence x = 6 q, ..., 6 n ,0, ...,0 of length 
2n, which arises naturally in FFT convolutions. First double 6 0 and 6 n so that 

x(Jfe) = ( T NN b){k) + i{T DD b)(k). 

Next compute the real sequences d 3 and Cj as in (D.92) and (D.93). Now pair up c, 
and dj to compute c and d by (D.91). Then the sine and cosine transforms, hence x, 
are given by (D.92) and (D.93). 

Note that a shifted cosine algorithm could be implemented by the symmetric data 
argument (D.87). Similar algorithms exist for the sine, cosine, and shifted sine. 
However in all these cases the DFT is of length 2n, twice as long as it need be. A 
suitable shifted cosine algorithm is actually somewhat easier to derive than the sine 
and cosine algorithms cited above. To compute 6 = T NhNh b first define c : = b? 3 . 
Then note that c n _j = 6 2 n+ i_( 2 ;+i) = ^j+i an< ^ that 6 0 = 61 = 62 „ by the inverse T/vd 
formula in table D.2. That is, c may be computed as 

c 0 = b\ 

c 3 = b 2 j for j = l,...,n/2 (D.94) 

c n -j = 6 2j+ i for j = 1 , ..., (n - l)/ 2 . 


Now write 
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where 


1 - 1 71-1 

- 


n 


k=x 


ex p|-(2i - ~K k ) 


+ exp{(2i-I)ll^Jl(„- fc )' 
= i n fex P {-;^}„ 

n k= o l n J 


Zo = 6(0) and z k = exp | j (b(k) - ib(n - k )) (D.95) 

for k — l,..,n — 1. Thus to compute 6, first apply a real transform to c to get 
z. Then scale z by exp{-fori/2n} for 1 < k < n/2 and take the real part to get 
b(k) and the imaginary part to get -~b(n - k ). To compute T ND simply reverse this 
procedure: Compute z by (D.95) for k = l,...,n/2 and apply the inverse of the real 
transform (D.91) to get c, hence b by (D.94). An alternative algorithm is given by 
Swarztrauber[94]. 

The shifted sine algorithm is completely analogous to the shifted cosine algorithm. 
The difference is only that in (D.95), 6(0) is replaced by (-l)*" 1 ^) and 6(Jfc) - ib(n- 
k) is replaced by (■ -b(k ) - i6(n-*))/i = -b(n - k) + ib(k) . Equivalently, the relation 
in (D.85) could be used to compute the shifted sine transform from the shifted cosine 
transform, or vice versa. 

Efficient dual shifted sine and cosine algorithms may be derived by natural modifi- 
cations of the shifted sine and cosine algorithms. For the dual shifted cosine transform 
Tn k d h , again define c as in (D.94) and modify (D.95) to get 


* = n£ C ° S (2j ' 


- “’(t) 


where 

= exp {(* - j)~} (b(k) - ib(n + 1 - *)) (D.96) 

for 1 < k < n with z 0 = z n . The problem now is that a transform must be applied 
to the complex sequence exp{-jjri/n}c i to solve for z. That is, the paired real 
transform (D.91) can no longer be used, resulting in twice as much work as necessary. 
The solution is to apply the complex FFT directly to a pair of problems in analogy 
with algorithm (D.91). Thus set 


exp 




where c and c 2 are defined by (D.94) from 6 1 and 6 2 . Then transform c to get z 1 -f-iz 2 
so that 


= 9 (Re(T/jt) - Im(t/ n+1 _jt)) 
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(D.97) 


b 2 (k) = ^ (Im(j/jt) 4- Re(t/n+i-jfc)) 

for 

y k = expj — (A: - ^)^-} ( z l + i**) 

= (^(Jfc) + 6 2 (n + 1 - A-)) + i ( -b l (n + 1 - k) + 6 2 (A)) . 

Just as with the shifted sine transform, there is a dual shifted sine transform 
analogous to the dual shifted cosine transform. The resulting formulas that replace 
(D.97) are 

^(A) = (Im(j/fc) — Re(r/ n+ i_*;)) (D.98) 

~b 2 {k) = -^(Re(y*) + Im(y n+ i.)i)) 
for 

y k = - ^(n + 1 - A) 4- £> 2 (A)) + i ( b l (k ) - 6 2 (n + 1 - A)J . 

Again, the relation in (D.85) may be used instead. 

One more transform is useful for FFT convolutions with symmetric data. Let x = 
& 0 , •••, 6 n _ i , b n , 6 n _i, ..., 6 0 , 0, ..., 0 be a symmetric, zero-extended sequence of length 4n. 
An efficient algorithm is derived for the DFT as follows. 

*<*) = exp{tn^}6„ + i;[exp{i;^}+exp{«:(2 n -j)|i}]6, 

= + 'Y_ exp(tj^) + ( — t ) 1 exp ( — *;> } ] b, 

' 2 £"=0 cos(k'jir/n)b J + (-1 ) k 'b n for k = 2k' 

k 2iE-=x sin((A' - l/2)jir/n)bj + ( — l) fc ' _1 i6 n for k = 2k' - 1 

Thus if b 0 is doubled, the even and odd k values are given by cosine and shifted sine 
transforms respectively: 

x(2k) = 2{T NN b)(k) 

x(2k — 1) = 2i(7Wb)(A). (D.99) 

Real arithmetical operation counts, as coded, for the transform algorithms de- 
scribed above are given in table D.3. 
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Length N Transforms 

Complex FFT 

Paired Real 

Paired Sine 

Paired Cosine 

Single real with 
zero-extended dataf 

Paired real with 
symmetric, zero- 
extended dataj 

Paired shifted sine 
or cosine 

Paired dual shifted 
sine or cosine 


Real Operation Count 

(5 log 2 N)N 
(5 + 5 log 2 N)N 
(13 + 5 log 2 N)N 
(14 + 5 log 2 N)N 
(14 + 5 log 2 N)N 

(30+10 log 2 N)N 

(12 + 5 log 2 N)N 
(18 + 5 log 2 N)N 


fData is b Q , 6jv, 0, 0 (length 27V). 

JData i s bo, •••> 5/v-t, b^f-i, b 0 , 0, 0 (length 4n). 

Table D.3: Transform Operation Counts 
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Di3.3 Implementation of the James Algorithm 

The basic theory of the algorithm described in section D.1.2 is straightforward, but 
several complications arise during the implementation. These come from the handling 
of the 6 planes that constitute the boundary of the computational box R, especially 
in phase 2 (boundary convolution). In addition, the presence of one or more planes 
of symmetry entails major modifications. 

The present implementation differs considerably from the description of James[89], 
whose method uses complex manipulations of sine and cosine transforms to conserve 
memory during the boundary convolution. We the process by combining these trans- 
forms into standard real and complex transforms. Several extra scratch planes are 
used to achieve major gains in vectorization, as well as simplification. 

At the highest level the algorithm is divided into three separate, but parallel, 
paths for ( 1 ) no planes of symmetry (2) only a Y plane of symmetry (3) both Y and 
Z planes of symmetry. If there is only a Z plane of symmetry, the Y and Z data are 
interchanged before and after doing the Y only algorithm. Within paths ( 2 ) and (3) 
there is a choice as to whether a plane of symmetry is located on the left (origin) or 
right. For example, if the computational box is represented by 

R = [0, m x \ x [0, m y \ x [0, m z ] 

with coordinates i, j, k , then a Y plane of symmetry may be specified either by j 0 
or j = m y . The algorithm assumes the j j = m y case, and simply reverse the Y data 
before and after if it is the j = 0 case. The algorithm could be formulated the other 
way around just as easily. 


Phase 1 

The basic method used to solve the Dirichlet problem (D.71) for the interior solution 
0 is to use X and Z sine transforms and a tridiagonal solver, as in formula (D. 88 ). 
However, instead of completing the interior solution in phase 1 , the inverse X and Z 
sine transforms are delayed to phase 3, where they are combined with the inverse X 
and Z transforms for xp. This is possible because phase 2 requires only the boundary 
charge function < 7 , which in turn needs only 9 values next to the edge of the box. For 
example, the computation of <7 on the i = 0 plane, denoted by <r yZ Q , reduces to 

cr yz o(j,k) = Q{j,k) - {D 2 6)(0,j,k) 

= Q(j,k)-6(l,j,k)/Ax 2 (D.100) 


since 6 is zero outside R and on dR. 

Using a, (3 , 7 to indicate sine transformed coordinates, 9(\,j,k) can be computed 
explicitly by 

6 ( 1,;,7) = — Z sinfa— W;, 7 ). (D.101) 

m r a = 1 V mx ' 

Then an inverse sine transform in Z is applied to get 0(1, i, k). Similarly 


9{m x 


l.j. 7) = — E'l-ir' -n (“£■) > D - 102 > 


0 = 1 
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Actually it is better to replace (D.101) and (D.102) by 


e(j, 7) 
°{j, 7) 

0(1, j, 7) 

6{m x - 1 ,;, 7) 


^5 sin ( 2a i) (,(2aJ ’ 7) 

E sin (( 2 « - !)^-) 0 ( 2 <> - 1 ,j, 7) 

a=1 V ' T *r ' 


°(h 7) + e (ji 7) 
°U 7) - e(i, 7) 


(D.103) 


for n e — ( m x — l)/2 and n 0 — m x j 2. Of course the sine factors in (D.40) are pre- 
computed. 

If there is a Z plane of symmetry, the major difference is that Z sine transforms are 
replace by shifted sine transforms, according to table D.2, since the symmetry plane 
(= zero Neumann boundary condition) is assumed to be on the right. This also means 
that the sine term in (D.101) is replaced by the shifted sine term sin((a — |)7r /m z ) 
when computing d(a, j, 1). 

If there is a Y plane of symmetry, the major difference is in the tridiagonal solver. 
By (D.73) there is a 2 in the subdiagonal of the bottom row, so the algorithm described 
in section D.3.1 is modified accordingly. 


Phase 2 

The basic idea of the boundary convolution is to zero extend the box R, doubling its 
size in each dimension, in order to compute xp on dR by 

ip = G*cr + Gd*T=- (Ga + Gd f) v . (D.I04) 

According to (D.41) the downstream source convolution Gd * r is to be interpreted 
in the following way. Let F x = forward DFT in X, etc. Then 

( G d *r)(i,j,k ) = (G d (i, •, •)) * r(j, k) 

= F;'F-'((F;'G d ){F,F,r)) 

= F;'F-'F-'(6 d (F,F,r)). (D.105) 

This means that FFT’s are first applied to the 6 boundary planes of a. Then, in a 
loop over the Z index, an XY plane of a is assembled from the 6 transformed planes. 
At the same time an XY plane of G is assembled from its 3 plane representation (D.37) 
and a plane of G ^ from its 4 plane representation (D.49). Then the multiplications 
and additions in (D.104) are done to get ip. Still inside the Z loop, some of the 
inverse DFT’s are performed explicitly to get 6 boundary planes. After the Z loop, 
the remaining inverse DFT’s are applied to these boundary planes to get ip\ qr. 

Let’s look at the transformation of the YZ boundary planes, a yz0 and cr yzl , in more 
detail. Since these are real, we may start with a real transform in Z. Next apply a 
complex transform in Y. The transform in X reduces to a butterfly operation (addition 
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and subtraction), with the sum of the two planes giving the even frequencies and the 
difference the odd frequencies: 


cr y z{0,7 ,p a ) = e»zo(P, 7 ) + 


(D.106) 


for p a = a mod 2. 

If there is a Y plane of symmetry, the Y transform becomes a transform of symmet- 
ric, zero-extended data with frequencies from 0 to 2 m y . By (D.99) the odd frequencies 
may be represented as nominally real values, like the even frequencies, but with an 
implicit scaling by i. If this is applied to the single XZ boundary plane, the result is 
just scaling by 2 for even frequencies and zero for odd frequencies: 


<7 rJ (a, 7) 


2a xz (a,~f) for 0 even 
0 for 0 odd. 


(D.107) 


The same methods apply to the transforms and inverse transforms of the other bound- 
ary planes. 

By (D.106) an XY plane of a may be assembled by the following formula: 

a(a,0,~f) = cr yz {0, 7, p a ) + a xz (a,~f,pp) + cr xy (a, 0,p y ). (D.108) 

One way to do this is by adding the XY and XZ planes while vectorizing in X: 

t(a,0) = cr xz {a,f,p 0 ) + cr xy {a, 0, p y ). (D.109) 

Then complete (D.107) and at the same time use (D.104) and (D.105) to do the 
Green’s function multiplications, taking advantage of operation chaining while vec- 
torizing in Y: 

i/>(a,/?,7) = (cr yz (0, 7, p a ) + t{a,0))G(a,0, 7) + t(0, 7)Gd(c*,/3, 7) (D.110) 

If there is a Y plane of symmetry, (D.107) shows that (D.109) is just a copy for 
odd 0 and addition of the single XZ plane for even 0. If there is also a Z plane of 
symmetry, (D.109) is only addition for 0 and 7 both even, and is zero if both are odd. 
Furthermore, in (D.110) cr yz and f are nominally real, with only implicit scaling by i 
for odd 0 and 7. Thus the nominally imaginary part of (D.110) reduces to 

Im(V>(a,/?,7)) := Im(<(a, 0))G(a, 0, 7). (D.lll) 

According to the Z inverse DFT formula, the boundary XY planes may be computed 
from 1 p by butterflying the sums of the even and odd Z index values: 


e(a,0) = 

1 m r _ 1 

£ V>(a,/?,27) 

7 = 0 

1 m z — 1 a 

5 £ V-(a,/9,27 + 1 ) 


o(a,0) = 

(D. 112) 

l/ , xyo(^i 0 ) 

e{a,0) + o(a, 0) 


^ Xy \ (a,0) = 

e{a,0) - o(a,0) 

(D.113) 
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The sums in ( D .112) are accumulated in the Z loop with the butterflies and scaling 
done later as an inverse operation to the butterflies (D.106). 

A way to gain efficiency is to let the Z loop run only from 1 to m z - 1 and set 

$(a,0,2m t - 7) = V’(2m x - a,2m y - /?, 7 ), (D.114) 

which follows directly from the DFT formula. In the case of Y symmetry (D.114) 
may be replaced by 


4>(a,/3,2m z - 7 ) = (-1)^0(2771* - a,/?, 7 ) (D.115) 

according to formula (D.99). In case of Z symmetry, the loop runs from 0 through 
2vn z and (D.114) may be replaced by 

4>(a,0,2m z - 7 ) = (- 1)^(0, /3, 7 ). (D.116) 

That is, the odd frequencies sum to zero and the even ones are doubled except at the 
endpoints 0 and 2 m z . 

The boundary XZ and XY planes are computed by the same principle of summing 
even and odd index values, but with complete computation of an X or Y line done all 
at once each time through the Z loop. For example, if there is a Y plane of symmetry, 
the even and odd 3 sums of ^ reduce to just doubling the even values, except for 
endpoints, by the Y symmetry analog of (D.116). 

Phase 3 

The method used in phase 1 is applied again in phase 3, but specialized to zero 
interior sources. This means that, for example, the two YZ boundary planes i^yzo 
and V’yzi are converted to equivalent sources at positions i — 1 and i = m r — 1 
respectively by scaling by -l/Ax 2 , according to (D.71). Now sine transform in Z to 
get 7 coordinates. Next, the sine transform in X to get a coordinates reduces to the 
following. 


MaJn ,p a ) = ^sin(a-^-)(0 yxO (;,7) + (-l)“V’yzi(;,7)) (D.117) 

for p a = a mod 2. The butterfly operations in (D.117) are done before the Z loop, 
and the precomputed sine factors are multiplied inside the loop. There is a formula 
analogous to (D.117) for the transformed XY boundary planes 4> xy (a, The 

XZ boundary planes require full X and Z sine transforms plus scaling by -1/A y 2 . 
Denote them by il> r2 (ct, 7, 0) and 0 X *(&, 7> 1 ) • Then inside the Z loop the transforms 
of the 6 boundary planes are summed to get an XY plane t as follows. 




‘ifryzipt-i ji 7, Pa) + + 


Hj - l)tM<*,7,0) + £(j - m y - l)Vtr*(a,7,l). 


(D.118) 


228 



In the case of Y symmetry, there is only one XZ plane. In the case of Z symmetry 
there is only one XY plane, and shifted sine transforms are used. The tridiagonal 
solver is applied to the plane t, and the result is added to the corresponding plane of 
0 (a,j, 7), saved from phase 1. 

After the Z loop, the inverse X and Z transforms are applied to get (f> in the interior 
of the box, and the original 6 boundary planes are copied onto its boundary. 
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Appendix E 

SPARSE SOLVER 


In this appendix, the general purpose sparse solver that is used to factor the sparse 
matrix is discussed. The sparse matrix is used as a left preconditioner in the solu- 
tion of the discrete equations, see Section 2.4. The sparse solver was designed for 
general usage to solve much larger problems than are feasible with existing sparse 
matrix software. The solver has a general input capability allowing contributions 
to matrix elements to be entered in any order. These contributions are sorted and 
combined to produce the final matrix. This feature is particularly convenient with 
finite elements, where element stiffness matrices can be generated in any order. The 
solver is out-of-core so that quite large problems can be solved on current comput- 
ers. Gaussian elimination is performed by block rows, additional blocks being created 
as fill is generated. The sparse solver takes full advantage of the hardware features 
on Cray computers including gather/scatter, vector compress, and large out-of-core 
memory afforded by the SSD on the Cray X-MP or Cray Y-MP. Considerable atten- 
tion has been devoted to making all phases of setting up and solving matrix problems 
convenient and efficient. A description of these phases including ordering the ma- 
trix elements to minimize the fill-in, matrix assembly, matrix decomposition, and the 
forward and backward substitution is presented. 


E.l NESTED DISSECTION ORDERING 

For large sparse problems, a matrix decomposition preconditioner is practical only if 
the decomposition is also sparse. This is true not only because of storage limitations, 
but also because of the CPU time required for forward and back substitution. One 
key to maintaining sparsity is a good permutation ordering for the rows and columns 
of the matrix. For sparse matrices resulting from standard discretizations of elliptic 
partial differential equations on uniform rectangular grids nested dissection hats been 
shown to be asymptotically optimal [97]. 

In TRANAIR a physically based version of nested dissection suitable for grids 
with local refinements has been implemented. One advantage of this method is that 
it does not require an examination of the graph of the matrix. The algorithm acts 
recursively on subsets of nodes (grid points). In TRANAIR, the discretization used 
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Figure E.l: Block Structure of a Sparse Matrix Ordered with Nested Dissection. 

near boundary surfaces can locate several solution unknowns at a same grid node 
location. The first such subset is the set of all nodes in the reduced set. For a 
set of nodes Af the algorithm finds a set of nodes Af d called a dissector. Writing 
Af = A fd UA/jU Af r , where Afi consists of nodes on one side of Af d and Af T those on 
the other. The dissector has the property that an unknown at any node on one side 
has a stencil that does not include unknowns located at nodes on the other side. 
The permutation is produced by ordering the nodes in the dissector last. Figure E.l 
shows the block structure of this matrix. The blocks of zeros remain intact, preserving 
sparsity during the decomposition. For a structured grid a plane of points forms a 
suitable dissector for a standard 27 point stencil. 

In TRAN AIR, dissectors are generated by first taking a cutting plane perpendic- 
ular to a coordinate axis and finding the set B of all boxes intersecting this plane by 
interrogating the oct-tree data structure (see Appendix A). Taking the case when 
the cutting plane is perpendicular to the x axis, the dissector of all nodes on the 
left hand (negative x) face of boxes in B that are also in Af. This will provide a 
dissector except near pseudo-nodes where the stencil is altered (see Section 2.3.5). 
When a pseudo-node is in the dissector its parent nodes must also be included in 
the dissector. Figure E.2 shows examples (in two dimensions) of cutting planes and 
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Figure E.2: Examples of Cutting Planes and Nodes in Resulting Dissector. 

corresponding sets of nodes in the dissectors. 

There are two guiding principles that aid in producing an effective nested dissection 
ordering. The first is that the components Mi and M r resulting from the dissection 
be of approximately equal size. The second is that the dissectors contain as few 
unknowns as possible. (The size of the dissectors can vary due to local refinement 
and the location of boundaries.) These principles can conflict and some compromise 
is necessary. The cutting plane is selected to have x coordinate equal to that of the 
node in M with median x coordinate. Dissectors perpendicular to each of the three 
coordinate axes are tested and the one yielding the smallest dissector is chosen. The 
process is repeated recursively on the newly formed components resulting from each 
dissection until all remaining components contain fewer than 50 nodes. 

The above algorithm is modified when regions of supersonic flow are present in 
the full potential case because upwinding of the density enlarges the stencil (see 
Section 2.3.7). If the cutting plane intersects a box which contains supersonic flow or 
is adjacent to such a box then all the nodes of the box are included in the dissector. 


E.2 MATRIX ASSEMBLY 

Contributions to the global stiffness matrix are generated on an element by element 
basis, i.e., element stiffness matrices are input one by one. This order is unrelated to 
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the ordering of the unknowns used for the decomposition. Thus, the contributions 
must be sorted and coalesced to produce the final global stiffness matrix. This process 
consists of four steps and is illustrated in the case of two blocks and a 3 by 3 matrix 
in Figure E.3. 

Each contribution is described by a numerical value (denoted by a letter) and by 
row and column indices. The four steps are as follows: 

• 1. All contributions are collected in equal sized blocks and stored out of core. 

• 2. Each block is returned to main memory and sorted by row index, resulting 
in an ordered sequence of row groups within each block. A row group is a group 
of contributions all having the same row index. A simple bucket sort algorithm 
[98] seems to be the most efficient for this purpose. 

• 3. These sorted blocks are then merged into ordered chains of blocks. An 
ordered chain is a chain of blocks such that all contributions in a given block 
have row indices less than or equal to those in all subsequent contributions in 
that block and the remaining blocks of the chain. Merging two chains consists 
in interleaving their row groups so that the resulting chain is also sorted by row 
index. At a given stage, the two shortest chains are always merged. Ultimately, 
the result is a single chain of blocks. Because the contributions are not sorted by 
column index at this stage, all movement of elements can be done by row group. 
This allows vectorization of the merge algorithm. 

• 4. All contributions to the same matrix element are then coalesced, i.e., within 
each row group, contributions with common column indices are added to form a 
single contribution. Coalescing can be done without first sorting each row group 
by column index. 

Note that most elements of the global stiffness matrix have contributions from 
8 element stiffness matrices in subsonic regions and as many as 64 in supersonic 
regions. Thus, the number of contributions may be up to 125 times greater than the 
number of elements in the assembled global stiffness matrix. In order to minimize 
storage requirements, the above four step process is performed repeatedly. (It can 
be performed at any point in the generation of contributions to the stiffness matrix.) 
When this is done, chains which have already been formed are not merged until new 
chains of equal size exist. Row groups are sorted by column index using a bucket sort 
only after completion of contribution input and coalescing. 

E.3 MATRIX DECOMPOSITION 


During the decomposition phase, the matrix is stored in a row format. For the 
purposes of transferring information between main memory and the SSD, the matrix 
is partitioned into row blocks. For each element of the matrix, two storage locations 
are used, one to store the element and the other to store its column index. Optionally, 
the matrix element and the column index can be packed into one word of storage. In 
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Figure E.3: Sorting and Merging Procedure. 

this mode, the 64-bit storage location devotes 43 bits to the matrix element and 21 
bits to its column index. Word packing reduces the SSD storage required to hold the 
matrix decomposition. Moreover, the CPU time required to perform the packing and 
unpacking is compensated for by the reduced time spent referencing memory. As a 
result, the word packed version of the sparse solver actually runs slightly faster than 
the unpacked version. 

Decomposition of the matrix is accomplished by Gaussian elimination. (For sparse 
matrix problems arising in TRANAIR no pivoting has yet been found necessary for 
numerical stability.) Each element in the lower triangle is eliminated in turn through 
a sparse SAXPY (SPAXPY) operation with the appropriate row. The multiplier 
becomes the corresponding element of the L matrix, and the appropriate U matrix 
row is modified by the SPAXPY operation. Each row block is decomposed, and 
when finished, used to eliminate corresponding lower triangular elements from all 
subsequent row blocks. As fill-in occurs the row blocks must be repartitioned and 
new row blocks added. An input/output package has been developed that does this 
automatically so that formally the code need only fetch or store any given row. When 
all lower triangular elements of a given row have been eliminated, small elements in 
the upper triangular part of the row can be dropped if they are small relative to 
the current row or column diagonal for that element. The criterion chosen depends 
on whether the problem is deemed well scaled row-wise or column-wise. The lower 
triangular elements are dropped in a similar fashion as they are eliminated, obviating 
the need for a SPAXPY operation. 

Unlike many sparse matrix solvers, there is no reliance on a symbolic factorization 
to facilitate the matrix decomposition. Instead, an explicit search is carried out to 
find the nonzero elements created during the matrix decomposition. This strategy 
allows the easy implementation of drop tolerances as required for the large problems 
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discussed in the following sections. 

As pointed out in the sparse matrix literature (for example, [99]), searching for 
nonzero elements would be prohibitively expensive on a normal scalar machine. For- 
tunately, the hardware vector mask and vector compress feature on the Cray X-MP 
allows us to perform the searches efficiently. For the largest problems solved to date 
involving extensive use of drop tolerances (those problems incurring the maximum 
penalty for nonzero searching), about 40 percent of the total decomposition cost is 
taken by the searches for the next nonzero element below the diagonal to be elimi- 
nated. Another 40 percent of the cost of the decomposition is taken by the SPAXPY 
operations which would be required whether or not a symbolic factorization was avail- 
able. The final 20 percent is taken by searching for the nonzero elements in the upper 
part of the matrix created by the SPAXPY operations. (The number of searches of 
this type depends on the size of blocks that can be held in core relative to the size 
of the LU decomposition. This 20 percent can be thought of as penalty for being 
out-of-core.) Thus in the worse case, for the moderately sparse matrices encountered 
in applications, the searches for nonzero elements increase the cost by about a factor 
of two. This cost is more than offset by the ability to introduce a drop tolerance. 

E.4 FORWARD /BACKWARD SUBSTITUTION 

The final phase is the forward/backward substitution. For problems requiring solu- 
tions for many right-hand sides, the substitution phase can be more expensive than 
the decomposition phase. Therefore, it is important to minimize the cost of the 
forward/ back ward substitution phase. For the short vector lengths characteristic of 
sparse matrix operations, a sparse vector dot product typically takes twice as long as 
a SPAXPY on the Cray X-MP. To take advantage of the relative speed of SPAXPY 
operations during the solution phase, instead of solving Ax = b, the equivalent system 
x t A t = b* is solved. Specification of the transposed problem is easily accomplished by 
transposing the row and column indices as they are collected. 

E.5 PERFORMANCE 

Typically, the matrices have between 30,000 and 300,000 rows with 20-30 nonzeros 
per row in subsonic flow. In regions of supersonic flow, there are 100-120 nonzeros 
per row. This increase is due to the larger operator stencil necessary to include the 
upwinding needed to rule out expansion shocks [22]. Without the use of a drop toler- 
ance, significant fill-in occurs during the decomposition yielding a decomposed matrix 
with 10-30 times the number of nonzero entries in the original matrix. For many large 
problems, the memory required by a full decomposition would be too large for the 
SSD. Because the matrix is used as a preconditioner to solve the problem iteratively, 
it is not essential that the decomposition be exact. During the decomposition ele- 
ments are dropped when they are less than a specified fraction (the drop tolerance) 
of the current diagonal element . With a suitable choice of drop tolerance, CPU time 
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and memory use for the decomposition are reduced by up to an order of magnitude 
with only a slight degradation in convergence rate. 

Table E.l: Performance Characteristics for the Sparse Solver with No Drop Toler- 
ance. Ten to Twenty Nonlinear Newton Steps are Required for each Solution. Each 
Linearized Solution Requires about 10 GMRES Iterations. 


Matrix 

Decomp 

CPU sec 

Decomp 

equations 

CPU sec 

per iteration 

(MW) 

10,018 

20 

1 

3 

30,267 

280 

3 

11 

36,627 

230 

4 

16 

50,655 

620 

3 

29 

63,069 

420 

6 

26 


Table E.2: Performance Characteristics for the Sparse Solver with Drop Tolerance. 
Each Linearized Solution Requires About 20-40 GMRES Iterations. 


Total 
equations 
IS, 376 
24,304 
37,702 
61,863 
124, 87S 
236,970 
247,703 
268,301 
284,052 
288,333 
373,813 
480,907 


Matrix 

equations 

11,514 

15,051 

18,731 

44,537 

81,216 

167,500 

156,659 

192,238 

181,802 

207,827 

241,474 

330,857 


Drop 
tolerance 
0.0010 " 
0.0010 
0.0010 
0.0010 
0.0005 
0.0010 
0.0008 
0.0010 
0.0008 
0.0010 
0.0010 
0.0008 


Decomp CPU second Decomp 
CPU sec per iteration (MW) 


7 

11 

13 

41 

91 

246 

241 

311 

297 

405 

631 

882 


0.2 

0.4 

0.4 

0.9 

1.3 

2.8 

3.3 

3.1 

4.1 
3.5 

5.8 

6.8 


1.5 

2.2 

2.4 

6.7 

12.6 

26.4 

28.6 

30.8 

30.6 

40.0 

54.0 

64.6 


Tables E.l and E.2 give the computer times for decomposition of the reduced 
set matrix for several representative cases run TRANAIR with and without a drop 
tolerance as well as the computer time required for each GMRES iteration. The 
storage required for the decomposition (using no word packing) is also shown and is 
equal to twice the number of nonzero entries in the decomposition. . 

Table E 2 gives both the number of equations in the reduced set (the size of the 
problem given to the sparse solver) and the total number of degrees of freedorm 
The size of the sparse matrix varies from about 10,000 unknowns to around 330,000 
unknowns. Note that with the use of a drop tolerance, the decomposition costs are 


237 


Drop Tolerance Study 
Coarse Grid OIMERA M6 Wing Test Case 
Total Solver CPU Time vs. Drop Tolerance 



Figure E.4: Cost versus drop tolerance for the ONERA M6 TRANAIR solution. 

reduced by nearly an order of magnitude. The costs per iteration are also reduced by 
about a factor of two. For each linear problem 10—20 iterations required to converge 
the solution when no drop tolerance is used. By using a drop tolerance in the range 
0.001-0.0001 these numbers are increased to 20-40 iterations. However, since the 
cost per iteration has been reduced by a factor of two, this helps to compensate for 
the increase in number of iterations. For drop tolerance in this range the size of the 
decomposition is between two to five times the size of the original matrix. 

Memory limitations make it impossible to test the effect of a full range of drop 
tolerances for a large problem. However, for a small problem with 28,050 finite 
elements serves to illustrate the effect of drop tolerance on the decomposition and 
overall solution costs. Figure E.4 illustrates total solution costs as a function of drop 
tolerance for a fluid dynamics problem (an ONERA M6 wing in transonic flow). 

Figure E.5 illustrates the SSD resource requirements. It is clear that there is a 
minimum in the total cost for drop tolerances in the range of 10~ 4 . When higher drop 
tolerances are used, the decomposition costs continue to decrease, but the iterative 
costs begin to increase due to the larger number of iterations required to reach a 
given level of convergence. For this configuration, a drop tolerance in the range of 
10~ 3 produces a slight increase in CPU time over the optimal value, but (Fig. E.5) 
significantly reduces the amount of SSD storage required (by more than a factor of 
two). CPU time must be balanced against the SSD storage requirements. Thus 
some experimentation may be required to determine an appropriate value of drop 
tolerance. For larger problems, the amount of SSD storage may become the critical 
limiting factor. The reduction in SSD storage for larger problems can be as high as 
a factor of 20 with an appropriate choice of drop tolerance. 
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Figure E.5: SSD storage versus drop tolerance for the ONERA M6 TRANAIR solu- 
tion. 
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